SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
1IBM Cloud / Data & AI / © 2019 IBM Corporation
Making the Most of Data in
Multiple Data Sources
(with Virtual Data Lakes)
Nagapriya Tiruthani
Offering Manager, IBM Db2 Big SQL
ntiruth@us.ibm.com
Prasad Pandit
Offering Management, Big Data & Db2
pprasad@us.ibm.com
Please
note
IBM’s statements regarding its plans, directions, and intent are subject to change
or withdrawal without notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment,
promise, or legal obligation to deliver any material, code or functionality. Information about
potential future products may not be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in
a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the
amount of multiprogramming in the user’s job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an
individual user will achieve results similar to those stated here.
IBM Cloud / Data & AI / © 2019 IBM Corporation
2
2
“Data will be your
basis of competitive advantage”
- Ginni Rometty
Businesses are becoming more data-driven.
IBM Cloud / Data & AI / © 2019 IBM Corporation 3
Ideas. Insights. Innovation.
Data Driven Insight Driven Digital Transformation
Culture Change
Breaking Silos
Discover “what”
Understand “why”
Self Service
Reports
Business Intelligence
Cost Reduction
Modernization
Prediction
Optimization
Automation
Collaboration
Models
Visualization
Applications
New Business Models
Disruptive Technology
Real-Time Decisions
AI
Multi-Cloud
Outcomes
Capabilities
Drivers
As data becomes more ACCESSIBLE it provides more VALUE
Most are here
Competitive Market
Leader
Value from Data
IBM Cloud / Data & AI / © 2019 IBM Corporation 4
IBM delivers the capabilities you
need to build a ladder to AI
IBM Cloud / © 2018 IBM Corporation
AI
Collect relevant data and make it
simple & accessible
Organize data so it can be trusted
Analyze insights on demand
Embed machine
learning everywhere
5
Live data is needed about customer behavior, buying
preferences, and sentiments to create more timely, relevant
offers
Businesses need to leverage all of their enterprise data in
order to make accurate decisions and properly analyze risk
Why Live Data?
IBM Cloud / Data & AI / © 2019 IBM Corporation
6
IBM Cloud / Data & AI / © 2019 IBM Corporation
Customers expect innovative and personalized digital
experiences….driven by modern databases
Personalized
and Engaging
(360* view)
Location Aware On Demand
IBM Cloud / Data & AI / © 2019 IBM Corporation
7
Current Challenges
Data is proliferating, often stored in
different locations and formats.
It’s getting more difficult to provide
data access and analytics to the
business.
8IBM Cloud / Data & AI / © 2019 IBM Corporation
95% organizations see negative impacts from
poor data quality, resulting in wasted resources and
additional costs
40%companies are struggling to find and
retain talent to help them uncover meaningful
insights from all the data they own and collect
Most companies only analyze
12%
of the data they have
IBM Cloud / Data & AI / © 2019 IBM Corporation
Db2 Big SQL offers a Unified Platform for Data Engineering and
Data Science
9IBM Cloud / Data & AI / © 2019 IBM Corporation
DATA
VIRTUALIZATION
Get an integrated
and live view of all
data across your
enterprise
SQL compatibility with
many SQL dialects,
enables reuse of talent
and applications
INDUSTRY LEADING
SQL ENGINE FOR
BIG DATA
ADVANCED
ANALYTICS AND ML
MADE SIMPLE
BI and data science
tools can access data
across the enterprise
for insights
FAST, INTEGRATED and SECURE DATA ACCESS LAYER
Db2 Big SQL
Data lakes Object stores NoSQL RDBMS
Modern Data Warehouse
IBM Cloud / Data & AI / © 2019 IBM Corporation
10
10
IBM Db2 Big SQL
IBM Db2 Big SQL, an advanced SQL engine on Hadoop,
supercharges your analytical workloads on data lakes with
no vendor lock-in.
The core capabilities of Db2 Big SQL focusses on data virtualization, SQL
compatibility, scalability, performance, and of course enterprise
security/governance, making it a desirable query engine to seek insights from
disparate data sources including Hadoop
IBM Cloud / Data & AI / © 2019 IBM Corporation 11
11
IBM Cloud / Data & AI / © 2019 IBM Corporation
Here’s What Db2 Big SQL Offers
Infusing the power of
Db2 to Hadoop
Bi-directional
integration with Spark
Supports all open
source file formats – no
vendor lock-in
Innovation
Mature SQL engine for
complex SQLs
Performance at scale
Optimize resource
utilization
Performance
Reuse applications and
skills
Centralized Hive
metastore
Accelerate reports with
MQTs
Enable BI tools reporting
on Hadoop
Flexibility
Query data where it
resides with no data
movement
Virtualize siloed data
across organization
Enrich streaming data
with data at rest
Confluence
Deep analytics with a
single point of entry
Access Big Data from
notebooks via Big SQL
Machine learning using
SQL
Cognitive
Db2 Big SQL provides the stability that applications and users need
with changing platforms 12
12
IBM Cloud / Data & AI / © 2019 IBM Corporation
No Vendor
Lock-in &
Seamless
Interoperability
with Hive
• Db2 Big SQL preserves open source foundation
• Separation of compute and storage - Data is part of Hadoop
• Alternate SQL execution engine
• Hive and Db2 Big SQL share common metadata
• Db2 Big SQL can invoke native Hive UDFs efficiently using
Parameter Style Hive
• Security and Governance can be done with Ranger/Atlas
SQL Execution Engines
Open Source Storage Model
CSV Parquet ORC
Others
…
Tab
Delim.
Hive Metastore
(open source)
Db2 Big SQL
(IBM)
Hive
(Open
Source)
13
Innovation
13
Self Tuning
Memory Manager
World Class
Cost Based Optimizer Query rewrite
Advanced Statistics
Native Row &
Columnar stores
Elastic Boost
SQL Compatibility
Hardened runtime
Advanced
Workload manager
Here’s why Db2 Big SQL can get you the best execution for complex queries and many
concurrent users with high performance
Materialized Query
Tables
Infuse the Power of Db2 in Big Data
Performance Concurrent users Complex query
IBM Cloud / Data & AI / © 2019 IBM Corporation
Performance
IBM Cloud / Data & AI / © 2019 IBM Corporation
Data warehouse offload to Hadoop is now made easy:
• Write one, run anywhere…
• Easy porting of applications
• Reuse skills of DBAs/ developers who know ANSI SQL
Db2 Big SQL is the best platform for
offloading
Oracle Data Marts and Warehouses
to Hadoop
Move Applications without Re-tooling
15
Flexibility
15
IBM Cloud / Data & AI / © 2019 IBM Corporation
Virtualized
environment to
query
heterogeneous
data
Only SQL-on-Hadoop to virtualizes more than 10 different data sources: RDBMS, NoSQL,
HDFS or Object Store with efficient query rewrite and predicate pushdown
MS SQL
Server
Netezza
(PDA) Oracle PostgreSQL Teradata
DB2
LUW, Db2z, DB2 on iInformix
WebHDFS
Object Store
(S3)
Hive HBase HDFS
CDH or HDP
Db2 Big SQL
NoSQL
ML
Model
Confluence
16
Transparent
§ Appears to be one source
§ Programmers don’t need to know how
/ where data is stored
Heterogeneous
§ Accesses data from diverse sources
High Function
§ Full query support against all data
§ Capabilities of sources as well
Autonomous
§ Non-disruptive to data sources,
existing applications, systems.
High Performance
§ Optimization of distributed queries
SQLtools&applications
Datasources
Db2 Big SQL
Oracle
SQL
Server
Teradata
DB2
Others..
SAP Hana
WebHDFS
S3
What does Db2 Big SQL Federation get you?
IBM Cloud / Data & AI / © 2019 IBM Corporation
Confluence
17
Bi-directional
integration with
Spark
• Db2 Big SQL is a self-tuning memory management SQL engine
that integrates with Spark in the platform
• Spark is a powerful analytic co-processor that complements the
rich SQL functionality of Big SQL
• Tight and scalable integration with Spark
• Enables exploiting Spark capabilities through Db2 Big SQL
• Operationalize ML models using SQL skills with Big SQL
Bi-directional
integration
allows Spark
jobs to be
executed from
Big SQL
Tight integration
with Spark
enables Big SQL
worker and
Spark Executor
to communicate
in memory
without writing
to disk
Db2
Big SQL
Share data
in memory
HDFS
Big SQL worker
Spark executor
IBM Cloud / Data & AI / © 2019 IBM Corporation 18
Cognitive
18
For more details check the blog: https://developer.ibm.com/hadoop/2017/11/07/ibm-big-sql-machine-learning-demo/
Operationalize Machine Learning Models using SQL
IBM Cloud / Data & AI / © 2019 IBM Corporation 19
Cognitive
19
Value Delivered to Customers
Db2
Big SQL
No vendor
lock-in
Query siloed
data across
organization
Combine
streaming
data with
data at rest
Reporting
with BI
tools
Operationali
ze ML model
with SQL
Accelerated
reporting
using MQTs
for
federated
data
Reuse
applications
& skills after
data offload
to Hadoop
• SQL compatibility with Oracle, Db2,
Netezza
• Applications / reports can be easily
ported to Hadoop
• Accelerated analytical
reports for historical data &
federated data
• Faster response for BI
• Invoke Big SQL directly from
Notebooks and make it easy for
data scientists to wrangle data
• Invoke Spark models directly from
Big SQL – make it easy for data
engineers to operationalize the
model
• BI tools (Tableau, Birst, etc.) have bad
performance when put directly on Hadoop
• Generate complex and star schema queries
• Enrich data lake with social media
data
• Add social sentiment data, click
stream data, log data or
unstructured data
• Overall provides a richer 360
degree view of customer
• Federation / Virtualization
• Avoid forced consolidation into Hadoop
• Make use of Hadoop infrastructure
• Centralized security & governance
• Separation of Compute from Hadoop storage (data is always
in Hadoop)
• Fully ANSI SQL engine – queries and skills can easily be
reused
20
Use Cases – Modernize EDW
A standardized (single) point of entry to your data lake
Db2 Big SQL
ANALYTICSDATASYSTEMS
Data
Marts
Business
Analytics
Visualization
& Dashboards
Systems of Record
RDBMS
ERP
CRM
Other
Clickstream Web & Social Geolocation Sensor
& Machine
Server
Logs
Unstructured
NEWSOURCES
HDP
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data,
Deeper Archive
& New Sources
Enterprise Data
Warehouse
Hot
CDH
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data,
Deeper Archive
& New Sources
• Access cold/archived
data on Hadoop
• Augment/enrich with
other data sources
with no data
movement
• Reuse SQL existing
skills
• Migrate applications to
Big Data from
traditional databases
• Standardize the data
environment for users
and applications with
changing platforms
Think 2019 / Session 4112 / Feb 13, 2019 / © 2019 IBM Corporation 21
Bloor Research - Db2 Big SQL is the Leader in SQL-on-Hadoop
https://www.bloorresearch.com/wp-content/uploads/2018/06/SQL-Engines-on-Hadoop-by-Philip-Howard-Bloor-Free-Download.pdf
IBM Cloud / Data & AI / © 2019 IBM Corporation
22
22
IBM Cloud / Data & AI / © 2019 IBM Corporation
Proven!
…
Customers: 200+
Data Scale: TBs (no architectural limit)
Cluster Size: <= 20 (BigInsights/ IOP)
<= 100 (HDP + migrations)
> 100 (new HDP customers)
Users: <= 100 (concurrent)
Big SQL 3.0 First Released April, 2014
Big SQL 4.0, 2015 (IOP)/ Big SQL 5.0 (HDP), 2017
Big SQL 6.0 (HDP 3.1), 2019
Big SQL 5.x (CDH 5.x), Q2 ‘19/ Big SQL 6.x (CDH 6.x), Q4 ‘19
-minimum 2 feature packs/ year + (on demand) cumulative fixpacks-
Source: Gartner Interactive Report, 2019
23
24IBM Cloud / Data & AI / © 2019 IBM Corporation
Customer Quotes
Big SQL’s response time for
queries is faster than others
99% of the time.
- American retail giant
Using Big SQL over HBase
enables fast key lookups to
meet end-to-end response
time of ~2secs requirement
- US national bank
High performance even
when 100s of users are
actively accessing the
datasets concurrently –
noticeably 4-5 times faster
than Hive
- US east coast bank
Modernized data warehouse
to run traditional data
warehouse analytics
(advanced SQL) using Big
SQL
- European bank
Only Big SQL could run all the
reports, unmodified on Day 1 of POC
when data was offloaded from
Oracle
- Software consulting company
Big SQL completely
CRASHED the
competition!
- Services company
Saved time and money by
reusing Netezza applications
on data offloaded to Hadoop
using Big SQL
- US utility company
IBM Cloud / Data & AI / © 2019 IBM Corporation
Db2 Big SQL on CDH
Db2 Big SQL
Start/Stop Db2 Big SQL in
Cloudera Manager
Manage security using
Apache Sentry
25
25
Enhanced Performance for Object Store
Protocols Supported: S3
Create Tables over Data residing in Object Store directly (no copy required into Hadoop)
Once configured, Object Store tables work like any other table in Big SQL
Benefits:
– No need to copy data into Hadoop first! Query data where it resides.
– Partitioning supported!
Tradeoff:
– Expect reduced performance relative to Hive tables
CREATE HADOOP TABLE staff ( … )
LOCATION
's3a://s3atables/staff';
LOAD FROM
Object Store
also
supported!
26IBM Cloud / Data & AI / © 2019 IBM Corporation
IBM Cloud / Data & AI / © 2019 IBM Corporation
Expand analytics to Blockchain data with
other enterprise data set (Hadoop, Spark,
RDBMs) for complex use cases without
additional work
• Empower end-users to query Blockchain data using
SQL seamlessly, like any other database
• Adopt blockchain as a source to perform
analytics/reporting while integrating with other data
stores to give 360 view of information
• Leveraging Db2 Big SQL’s federation to offer SQL
querying capabilities on Blockchain while preserving
the blockchain security model
Blockchain
Traditional
Databases
Reporting Tools
Data Access
Layer
Newer data
stores
Query Blockchain using SQL (Tech Preview)
LedgerSQL
Db2 Big SQL
NoSQL
27
27
Above all, bring stability to your applications even when the platform goes through updates......
Make Db2 Big SQL the point of entry to Big Data irrespective of which
platform has the data
Accelerate time to
market while
modernizing your
warehouse
Empower SQL users
to operationalize ML
models
Augment disparate
data for deep
analytics and AI
PERFORMANCE
SECURITY
Enable BI analytics
with high
performance and
enterprise security
IBM Cloud / Data & AI / © 2019 IBM Corporation
Summary
With Db2 Big SQL, users can focus on what they want to do, and not worry about how it is executed..
28
IBM Cloud / Data & AI / © 2019 IBM Corporation
Digital Transformation begins with a Single Step
Align Business & IT
Goals
Improve outcomes
Reduce costs
Empower end-users
1. Use all data
to deliver deeper insights and enhanced
data-driven decision making
2. Reduce TCO and risk
by integrating with existing investments
for greater ROI
3. Provide flexibility
in tools and applications for improved
productivity
Solution
29
29
Above all, bring stability to your applications even when the platform goes through updates......
Make Db2 Big SQL the point of entry to Big Data irrespective of which
platform has the data
Accelerate time to
market while
modernizing your
warehouse
Empower SQL users
to operationalize ML
models
Augment disparate
data for deep
analytics and AI
PERFORMANCE
SECURITY
Enable BI analytics
with high
performance and
enterprise security
IBM Cloud / Data & AI / © 2019 IBM Corporation
Summary
With Db2 Big SQL, users can focus on what they want to do, and not worry about how it is executed..
30
IBM Cloud / Data & AI / © 2019 IBM Corporation
DEMO
Technologies that help increase
business benefits and time to value
31
31
Anita is a data scientist at an e-commerce company PerfectGiftsRUs. She
is tasked with running retail analytics. On this Monday morning, Anita’s
boss has just asked her to look into company data to provide better
insights into customer behavior and purchases that can be used to
improve sales and inventory forecasts.
The Monday Morning Challenge
3232IBM Cloud / Data & AI / © 2019 IBM Corporation
Anita’s Challenges – Performing Data Science over complete data which
are siloed
CRM + Product
Catalog
Point of SalesData Lake with
Historical Sales data
3333IBM Cloud / Data & AI / © 2019 IBM Corporation
In the past, Anita’s attempts at merging information from multiple data
silos have been faced with the hassles of designing complex data
pipelines to access latest data for analysis. Disrupting overworked DBAs
is not what Anita has in mind.
What are Anita’s Options?
3434IBM Cloud / Data & AI / © 2019 IBM Corporation
How IBM and Cloudera advance Client’s Analytics Journey
Big Data
Persistent Storage
Hortonworks
Data Platform
IBM
Db2 Big SQL
Big Data
Access Layer
IBM Watson Studio
(previously DSX)
Data Science and
Machine Learning IDE
35IBM Cloud / Data & AI / © 2019 IBM Corporation
CUSTOMER
CUSTOMER
ID
COUNTRY
Anita’s Solution?
PRODUCT
PRODUCT ID
PRODUCT
DESC
INVOICE
ID
PRODUCT
ID
PRODUCT
NAME
STO
CK
COD
E
PRODUCT
DESC
Customer
ID
Country Unit price
312 30001 Herb maker
basil
951 Cooking
Gear
Wrewr UK 56
100 30004 TrailChef Cup 951 Cooking
Gear
Wrewr USA 100
451 30006 TrailChef
Deluxe Cook
951 Cooking
Gear
hdfgfdg CANADA 20
INVOICE
ID
PRODU
CT ID
PRODUCT
NAME
STO
CK
COD
E
QUA
NTIT
Y
312 30001 Herb maker
basil
951 15
100 30004 TrailChef
Cup
951 1
451 30006 TrailChef
Deluxe
Cook
951 5
Db2 Big SQL provides
a unified view of the
combined data
Db2 Big SQL federates
access to the two data
sources
Watson Studio (DSX)
can run its machine
learning and analytics
on the combined data
using Db2 Big SQL
3636IBM Cloud / Data & AI / © 2019 IBM Corporation
Anita saved the day. Using Big SQL’s analytics and federation
capabilities, she was able to generate accurate insights on the latest data
– just in time for the big sales meeting. Her boss was thrilled!
The Monday Afternoon Save
37IBM Cloud / Data & AI / © 2019 IBM Corporation
© 2018 International Business Machines Corporation. No part of this
document may be reproduced or transmitted in any form without
written permission from IBM.
U.S. Government Users Restricted Rights — use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to
products that have not yet been announced by IBM) has been reviewed
for accuracy as of the date of initial publication and could include
unintentional technical or typographical errors. IBM shall have no
responsibility to update this information. This document is distributed
“as is” without any warranty, either express or implied. In no event,
shall IBM be liable for any damage arising from the use of this
information, including but not limited to, loss of data, business
interruption, loss of profit or loss of opportunity. IBM products and
services are warranted per the terms and conditions of the agreements
under which they are provided.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously
installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product
plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a
controlled, isolated environments. Customer examples are presented as
illustrations of how those
customers have used IBM products and the results they may have
achieved. Actual performance, cost, savings or other results in other
operating environments may vary.
References in this document to IBM products, programs, or services does
not imply that IBM intends to make such products, programs or services
available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared
by independent session speakers, and do not necessarily reflect the views
of IBM. All materials and discussions are provided for informational
purposes only, and are neither intended to, nor shall constitute legal or
other guidance or advice to any individual participant or their specific
situation.
It is the customer’s responsibility to insure its own compliance with legal
requirements and to obtain advice of competent legal counsel as to
the identification and interpretation of any relevant laws and regulatory
requirements that may affect the customer’s business and any actions the
customer may need to take to comply with such laws. IBM does not
provide legal advice or represent or warrant that its services or products
will ensure that the customer follows any law.
Notices and disclaimers
IBM Cloud / Data & AI / © 2019 IBM Corporation
Think 2019 / Session 4112 / Feb 13, 2019 / © 2019 IBM Corporation 38
38
Information concerning non-IBM products was
obtained from the suppliers of those products, their
published announcements or other publicly
available sources. IBM has not tested
those products about this publication and cannot
confirm the accuracy of performance, compatibility
or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM
products should be addressed to the suppliers of
those products. IBM does not warrant the quality of
any third-party products, or the ability of any such
third-party products to interoperate with IBM’s
products. IBM expressly disclaims all warranties,
expressed or implied, including but not limited to,
the implied warranties of merchantability and
fitness for a purpose.
The provision of the information contained herein is
not intended to, and does not, grant any right or
license under any IBM patents, copyrights,
trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com and [names of other referenced IBM products
and services used in the presentation] are trademarks of International
Business Machines Corporation, registered in many jurisdictions
worldwide. Other product and service names might be trademarks of IBM
or other companies. A current list of IBM trademarks is available on
the Web at "Copyright and trademark information" at:
www.ibm.com/legal/copytrade.shtml.
.
Notices and disclaimers continued
IBM Cloud / Data & AI / © 2019 IBM Corporation
Think 2019 / Session 4112 / Feb 13, 2019 / © 2019 IBM Corporation 39
39
IBM Cloud / Data & AI / © 2019 IBM Corporation 40

Weitere ähnliche Inhalte

Was ist angesagt?

A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
DataWorks Summit
 

Was ist angesagt? (20)

Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
 
OpenPOWER Update
OpenPOWER UpdateOpenPOWER Update
OpenPOWER Update
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
Delivering Data Science to the Business
Delivering Data Science to the BusinessDelivering Data Science to the Business
Delivering Data Science to the Business
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 
Sidecars and a Microservices Mesh
Sidecars and a Microservices MeshSidecars and a Microservices Mesh
Sidecars and a Microservices Mesh
 
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
 
Hadoop for the Masses
Hadoop for the MassesHadoop for the Masses
Hadoop for the Masses
 
Audi‘s Hadoop Journey into the Hybrid Cloud
Audi‘s Hadoop Journey into the Hybrid CloudAudi‘s Hadoop Journey into the Hybrid Cloud
Audi‘s Hadoop Journey into the Hybrid Cloud
 

Ähnlich wie Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)

Liberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and DatabricksLiberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and Databricks
Precisely
 
Deploying IBM WebSphere Application Server to the Cloud_GWC_3-24-2015
Deploying IBM WebSphere Application Server to the Cloud_GWC_3-24-2015Deploying IBM WebSphere Application Server to the Cloud_GWC_3-24-2015
Deploying IBM WebSphere Application Server to the Cloud_GWC_3-24-2015
Yakura Coffee
 

Ähnlich wie Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes) (20)

Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Nrb Mainframe Day z Data and AI - Leif Pedersen
Nrb Mainframe Day z Data and AI - Leif PedersenNrb Mainframe Day z Data and AI - Leif Pedersen
Nrb Mainframe Day z Data and AI - Leif Pedersen
 
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Liberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and DatabricksLiberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and Databricks
 
Breaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over HadoopBreaching the 100TB Mark with SQL Over Hadoop
Breaching the 100TB Mark with SQL Over Hadoop
 
A journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercializationA journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercialization
 
IMS08 the momentum driving the ims future
IMS08   the momentum driving the ims futureIMS08   the momentum driving the ims future
IMS08 the momentum driving the ims future
 
BLU Acceleration on the Cloud – 101
BLU Acceleration on the Cloud – 101BLU Acceleration on the Cloud – 101
BLU Acceleration on the Cloud – 101
 
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Future of Power: IBM Trends & Directions - Erik Rex
Future of Power: IBM Trends & Directions - Erik RexFuture of Power: IBM Trends & Directions - Erik Rex
Future of Power: IBM Trends & Directions - Erik Rex
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Db2 tools
Db2 toolsDb2 tools
Db2 tools
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Libera la potenza del Machine Learning
Libera la potenza del Machine LearningLibera la potenza del Machine Learning
Libera la potenza del Machine Learning
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
 
Benchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herdBenchmarking Hadoop - Which hadoop sql engine leads the herd
Benchmarking Hadoop - Which hadoop sql engine leads the herd
 
Deploying IBM WebSphere Application Server to the Cloud_GWC_3-24-2015
Deploying IBM WebSphere Application Server to the Cloud_GWC_3-24-2015Deploying IBM WebSphere Application Server to the Cloud_GWC_3-24-2015
Deploying IBM WebSphere Application Server to the Cloud_GWC_3-24-2015
 
Spark working with a Cloud IDE: Notebook/Shiny Apps
Spark working with a Cloud IDE: Notebook/Shiny AppsSpark working with a Cloud IDE: Notebook/Shiny Apps
Spark working with a Cloud IDE: Notebook/Shiny Apps
 

Mehr von DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes)

  • 1. 1IBM Cloud / Data & AI / © 2019 IBM Corporation Making the Most of Data in Multiple Data Sources (with Virtual Data Lakes) Nagapriya Tiruthani Offering Manager, IBM Db2 Big SQL ntiruth@us.ibm.com Prasad Pandit Offering Management, Big Data & Db2 pprasad@us.ibm.com
  • 2. Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. IBM Cloud / Data & AI / © 2019 IBM Corporation 2 2
  • 3. “Data will be your basis of competitive advantage” - Ginni Rometty Businesses are becoming more data-driven. IBM Cloud / Data & AI / © 2019 IBM Corporation 3 Ideas. Insights. Innovation.
  • 4. Data Driven Insight Driven Digital Transformation Culture Change Breaking Silos Discover “what” Understand “why” Self Service Reports Business Intelligence Cost Reduction Modernization Prediction Optimization Automation Collaboration Models Visualization Applications New Business Models Disruptive Technology Real-Time Decisions AI Multi-Cloud Outcomes Capabilities Drivers As data becomes more ACCESSIBLE it provides more VALUE Most are here Competitive Market Leader Value from Data IBM Cloud / Data & AI / © 2019 IBM Corporation 4
  • 5. IBM delivers the capabilities you need to build a ladder to AI IBM Cloud / © 2018 IBM Corporation AI Collect relevant data and make it simple & accessible Organize data so it can be trusted Analyze insights on demand Embed machine learning everywhere 5
  • 6. Live data is needed about customer behavior, buying preferences, and sentiments to create more timely, relevant offers Businesses need to leverage all of their enterprise data in order to make accurate decisions and properly analyze risk Why Live Data? IBM Cloud / Data & AI / © 2019 IBM Corporation 6
  • 7. IBM Cloud / Data & AI / © 2019 IBM Corporation Customers expect innovative and personalized digital experiences….driven by modern databases Personalized and Engaging (360* view) Location Aware On Demand IBM Cloud / Data & AI / © 2019 IBM Corporation 7
  • 8. Current Challenges Data is proliferating, often stored in different locations and formats. It’s getting more difficult to provide data access and analytics to the business. 8IBM Cloud / Data & AI / © 2019 IBM Corporation 95% organizations see negative impacts from poor data quality, resulting in wasted resources and additional costs 40%companies are struggling to find and retain talent to help them uncover meaningful insights from all the data they own and collect Most companies only analyze 12% of the data they have IBM Cloud / Data & AI / © 2019 IBM Corporation
  • 9. Db2 Big SQL offers a Unified Platform for Data Engineering and Data Science 9IBM Cloud / Data & AI / © 2019 IBM Corporation DATA VIRTUALIZATION Get an integrated and live view of all data across your enterprise SQL compatibility with many SQL dialects, enables reuse of talent and applications INDUSTRY LEADING SQL ENGINE FOR BIG DATA ADVANCED ANALYTICS AND ML MADE SIMPLE BI and data science tools can access data across the enterprise for insights FAST, INTEGRATED and SECURE DATA ACCESS LAYER
  • 10. Db2 Big SQL Data lakes Object stores NoSQL RDBMS Modern Data Warehouse IBM Cloud / Data & AI / © 2019 IBM Corporation 10 10
  • 11. IBM Db2 Big SQL IBM Db2 Big SQL, an advanced SQL engine on Hadoop, supercharges your analytical workloads on data lakes with no vendor lock-in. The core capabilities of Db2 Big SQL focusses on data virtualization, SQL compatibility, scalability, performance, and of course enterprise security/governance, making it a desirable query engine to seek insights from disparate data sources including Hadoop IBM Cloud / Data & AI / © 2019 IBM Corporation 11 11
  • 12. IBM Cloud / Data & AI / © 2019 IBM Corporation Here’s What Db2 Big SQL Offers Infusing the power of Db2 to Hadoop Bi-directional integration with Spark Supports all open source file formats – no vendor lock-in Innovation Mature SQL engine for complex SQLs Performance at scale Optimize resource utilization Performance Reuse applications and skills Centralized Hive metastore Accelerate reports with MQTs Enable BI tools reporting on Hadoop Flexibility Query data where it resides with no data movement Virtualize siloed data across organization Enrich streaming data with data at rest Confluence Deep analytics with a single point of entry Access Big Data from notebooks via Big SQL Machine learning using SQL Cognitive Db2 Big SQL provides the stability that applications and users need with changing platforms 12 12
  • 13. IBM Cloud / Data & AI / © 2019 IBM Corporation No Vendor Lock-in & Seamless Interoperability with Hive • Db2 Big SQL preserves open source foundation • Separation of compute and storage - Data is part of Hadoop • Alternate SQL execution engine • Hive and Db2 Big SQL share common metadata • Db2 Big SQL can invoke native Hive UDFs efficiently using Parameter Style Hive • Security and Governance can be done with Ranger/Atlas SQL Execution Engines Open Source Storage Model CSV Parquet ORC Others … Tab Delim. Hive Metastore (open source) Db2 Big SQL (IBM) Hive (Open Source) 13 Innovation 13
  • 14. Self Tuning Memory Manager World Class Cost Based Optimizer Query rewrite Advanced Statistics Native Row & Columnar stores Elastic Boost SQL Compatibility Hardened runtime Advanced Workload manager Here’s why Db2 Big SQL can get you the best execution for complex queries and many concurrent users with high performance Materialized Query Tables Infuse the Power of Db2 in Big Data Performance Concurrent users Complex query IBM Cloud / Data & AI / © 2019 IBM Corporation Performance
  • 15. IBM Cloud / Data & AI / © 2019 IBM Corporation Data warehouse offload to Hadoop is now made easy: • Write one, run anywhere… • Easy porting of applications • Reuse skills of DBAs/ developers who know ANSI SQL Db2 Big SQL is the best platform for offloading Oracle Data Marts and Warehouses to Hadoop Move Applications without Re-tooling 15 Flexibility 15
  • 16. IBM Cloud / Data & AI / © 2019 IBM Corporation Virtualized environment to query heterogeneous data Only SQL-on-Hadoop to virtualizes more than 10 different data sources: RDBMS, NoSQL, HDFS or Object Store with efficient query rewrite and predicate pushdown MS SQL Server Netezza (PDA) Oracle PostgreSQL Teradata DB2 LUW, Db2z, DB2 on iInformix WebHDFS Object Store (S3) Hive HBase HDFS CDH or HDP Db2 Big SQL NoSQL ML Model Confluence 16
  • 17. Transparent § Appears to be one source § Programmers don’t need to know how / where data is stored Heterogeneous § Accesses data from diverse sources High Function § Full query support against all data § Capabilities of sources as well Autonomous § Non-disruptive to data sources, existing applications, systems. High Performance § Optimization of distributed queries SQLtools&applications Datasources Db2 Big SQL Oracle SQL Server Teradata DB2 Others.. SAP Hana WebHDFS S3 What does Db2 Big SQL Federation get you? IBM Cloud / Data & AI / © 2019 IBM Corporation Confluence 17
  • 18. Bi-directional integration with Spark • Db2 Big SQL is a self-tuning memory management SQL engine that integrates with Spark in the platform • Spark is a powerful analytic co-processor that complements the rich SQL functionality of Big SQL • Tight and scalable integration with Spark • Enables exploiting Spark capabilities through Db2 Big SQL • Operationalize ML models using SQL skills with Big SQL Bi-directional integration allows Spark jobs to be executed from Big SQL Tight integration with Spark enables Big SQL worker and Spark Executor to communicate in memory without writing to disk Db2 Big SQL Share data in memory HDFS Big SQL worker Spark executor IBM Cloud / Data & AI / © 2019 IBM Corporation 18 Cognitive 18
  • 19. For more details check the blog: https://developer.ibm.com/hadoop/2017/11/07/ibm-big-sql-machine-learning-demo/ Operationalize Machine Learning Models using SQL IBM Cloud / Data & AI / © 2019 IBM Corporation 19 Cognitive 19
  • 20. Value Delivered to Customers Db2 Big SQL No vendor lock-in Query siloed data across organization Combine streaming data with data at rest Reporting with BI tools Operationali ze ML model with SQL Accelerated reporting using MQTs for federated data Reuse applications & skills after data offload to Hadoop • SQL compatibility with Oracle, Db2, Netezza • Applications / reports can be easily ported to Hadoop • Accelerated analytical reports for historical data & federated data • Faster response for BI • Invoke Big SQL directly from Notebooks and make it easy for data scientists to wrangle data • Invoke Spark models directly from Big SQL – make it easy for data engineers to operationalize the model • BI tools (Tableau, Birst, etc.) have bad performance when put directly on Hadoop • Generate complex and star schema queries • Enrich data lake with social media data • Add social sentiment data, click stream data, log data or unstructured data • Overall provides a richer 360 degree view of customer • Federation / Virtualization • Avoid forced consolidation into Hadoop • Make use of Hadoop infrastructure • Centralized security & governance • Separation of Compute from Hadoop storage (data is always in Hadoop) • Fully ANSI SQL engine – queries and skills can easily be reused 20
  • 21. Use Cases – Modernize EDW A standardized (single) point of entry to your data lake Db2 Big SQL ANALYTICSDATASYSTEMS Data Marts Business Analytics Visualization & Dashboards Systems of Record RDBMS ERP CRM Other Clickstream Web & Social Geolocation Sensor & Machine Server Logs Unstructured NEWSOURCES HDP ELT ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Cold Data, Deeper Archive & New Sources Enterprise Data Warehouse Hot CDH ELT ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Cold Data, Deeper Archive & New Sources • Access cold/archived data on Hadoop • Augment/enrich with other data sources with no data movement • Reuse SQL existing skills • Migrate applications to Big Data from traditional databases • Standardize the data environment for users and applications with changing platforms Think 2019 / Session 4112 / Feb 13, 2019 / © 2019 IBM Corporation 21
  • 22. Bloor Research - Db2 Big SQL is the Leader in SQL-on-Hadoop https://www.bloorresearch.com/wp-content/uploads/2018/06/SQL-Engines-on-Hadoop-by-Philip-Howard-Bloor-Free-Download.pdf IBM Cloud / Data & AI / © 2019 IBM Corporation 22 22
  • 23. IBM Cloud / Data & AI / © 2019 IBM Corporation Proven! … Customers: 200+ Data Scale: TBs (no architectural limit) Cluster Size: <= 20 (BigInsights/ IOP) <= 100 (HDP + migrations) > 100 (new HDP customers) Users: <= 100 (concurrent) Big SQL 3.0 First Released April, 2014 Big SQL 4.0, 2015 (IOP)/ Big SQL 5.0 (HDP), 2017 Big SQL 6.0 (HDP 3.1), 2019 Big SQL 5.x (CDH 5.x), Q2 ‘19/ Big SQL 6.x (CDH 6.x), Q4 ‘19 -minimum 2 feature packs/ year + (on demand) cumulative fixpacks- Source: Gartner Interactive Report, 2019 23
  • 24. 24IBM Cloud / Data & AI / © 2019 IBM Corporation Customer Quotes Big SQL’s response time for queries is faster than others 99% of the time. - American retail giant Using Big SQL over HBase enables fast key lookups to meet end-to-end response time of ~2secs requirement - US national bank High performance even when 100s of users are actively accessing the datasets concurrently – noticeably 4-5 times faster than Hive - US east coast bank Modernized data warehouse to run traditional data warehouse analytics (advanced SQL) using Big SQL - European bank Only Big SQL could run all the reports, unmodified on Day 1 of POC when data was offloaded from Oracle - Software consulting company Big SQL completely CRASHED the competition! - Services company Saved time and money by reusing Netezza applications on data offloaded to Hadoop using Big SQL - US utility company
  • 25. IBM Cloud / Data & AI / © 2019 IBM Corporation Db2 Big SQL on CDH Db2 Big SQL Start/Stop Db2 Big SQL in Cloudera Manager Manage security using Apache Sentry 25 25
  • 26. Enhanced Performance for Object Store Protocols Supported: S3 Create Tables over Data residing in Object Store directly (no copy required into Hadoop) Once configured, Object Store tables work like any other table in Big SQL Benefits: – No need to copy data into Hadoop first! Query data where it resides. – Partitioning supported! Tradeoff: – Expect reduced performance relative to Hive tables CREATE HADOOP TABLE staff ( … ) LOCATION 's3a://s3atables/staff'; LOAD FROM Object Store also supported! 26IBM Cloud / Data & AI / © 2019 IBM Corporation
  • 27. IBM Cloud / Data & AI / © 2019 IBM Corporation Expand analytics to Blockchain data with other enterprise data set (Hadoop, Spark, RDBMs) for complex use cases without additional work • Empower end-users to query Blockchain data using SQL seamlessly, like any other database • Adopt blockchain as a source to perform analytics/reporting while integrating with other data stores to give 360 view of information • Leveraging Db2 Big SQL’s federation to offer SQL querying capabilities on Blockchain while preserving the blockchain security model Blockchain Traditional Databases Reporting Tools Data Access Layer Newer data stores Query Blockchain using SQL (Tech Preview) LedgerSQL Db2 Big SQL NoSQL 27 27
  • 28. Above all, bring stability to your applications even when the platform goes through updates...... Make Db2 Big SQL the point of entry to Big Data irrespective of which platform has the data Accelerate time to market while modernizing your warehouse Empower SQL users to operationalize ML models Augment disparate data for deep analytics and AI PERFORMANCE SECURITY Enable BI analytics with high performance and enterprise security IBM Cloud / Data & AI / © 2019 IBM Corporation Summary With Db2 Big SQL, users can focus on what they want to do, and not worry about how it is executed.. 28
  • 29. IBM Cloud / Data & AI / © 2019 IBM Corporation Digital Transformation begins with a Single Step Align Business & IT Goals Improve outcomes Reduce costs Empower end-users 1. Use all data to deliver deeper insights and enhanced data-driven decision making 2. Reduce TCO and risk by integrating with existing investments for greater ROI 3. Provide flexibility in tools and applications for improved productivity Solution 29 29
  • 30. Above all, bring stability to your applications even when the platform goes through updates...... Make Db2 Big SQL the point of entry to Big Data irrespective of which platform has the data Accelerate time to market while modernizing your warehouse Empower SQL users to operationalize ML models Augment disparate data for deep analytics and AI PERFORMANCE SECURITY Enable BI analytics with high performance and enterprise security IBM Cloud / Data & AI / © 2019 IBM Corporation Summary With Db2 Big SQL, users can focus on what they want to do, and not worry about how it is executed.. 30
  • 31. IBM Cloud / Data & AI / © 2019 IBM Corporation DEMO Technologies that help increase business benefits and time to value 31 31
  • 32. Anita is a data scientist at an e-commerce company PerfectGiftsRUs. She is tasked with running retail analytics. On this Monday morning, Anita’s boss has just asked her to look into company data to provide better insights into customer behavior and purchases that can be used to improve sales and inventory forecasts. The Monday Morning Challenge 3232IBM Cloud / Data & AI / © 2019 IBM Corporation
  • 33. Anita’s Challenges – Performing Data Science over complete data which are siloed CRM + Product Catalog Point of SalesData Lake with Historical Sales data 3333IBM Cloud / Data & AI / © 2019 IBM Corporation
  • 34. In the past, Anita’s attempts at merging information from multiple data silos have been faced with the hassles of designing complex data pipelines to access latest data for analysis. Disrupting overworked DBAs is not what Anita has in mind. What are Anita’s Options? 3434IBM Cloud / Data & AI / © 2019 IBM Corporation
  • 35. How IBM and Cloudera advance Client’s Analytics Journey Big Data Persistent Storage Hortonworks Data Platform IBM Db2 Big SQL Big Data Access Layer IBM Watson Studio (previously DSX) Data Science and Machine Learning IDE 35IBM Cloud / Data & AI / © 2019 IBM Corporation
  • 36. CUSTOMER CUSTOMER ID COUNTRY Anita’s Solution? PRODUCT PRODUCT ID PRODUCT DESC INVOICE ID PRODUCT ID PRODUCT NAME STO CK COD E PRODUCT DESC Customer ID Country Unit price 312 30001 Herb maker basil 951 Cooking Gear Wrewr UK 56 100 30004 TrailChef Cup 951 Cooking Gear Wrewr USA 100 451 30006 TrailChef Deluxe Cook 951 Cooking Gear hdfgfdg CANADA 20 INVOICE ID PRODU CT ID PRODUCT NAME STO CK COD E QUA NTIT Y 312 30001 Herb maker basil 951 15 100 30004 TrailChef Cup 951 1 451 30006 TrailChef Deluxe Cook 951 5 Db2 Big SQL provides a unified view of the combined data Db2 Big SQL federates access to the two data sources Watson Studio (DSX) can run its machine learning and analytics on the combined data using Db2 Big SQL 3636IBM Cloud / Data & AI / © 2019 IBM Corporation
  • 37. Anita saved the day. Using Big SQL’s analytics and federation capabilities, she was able to generate accurate insights on the latest data – just in time for the big sales meeting. Her boss was thrilled! The Monday Afternoon Save 37IBM Cloud / Data & AI / © 2019 IBM Corporation
  • 38. © 2018 International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer follows any law. Notices and disclaimers IBM Cloud / Data & AI / © 2019 IBM Corporation Think 2019 / Session 4112 / Feb 13, 2019 / © 2019 IBM Corporation 38 38
  • 39. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com and [names of other referenced IBM products and services used in the presentation] are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml. . Notices and disclaimers continued IBM Cloud / Data & AI / © 2019 IBM Corporation Think 2019 / Session 4112 / Feb 13, 2019 / © 2019 IBM Corporation 39 39
  • 40. IBM Cloud / Data & AI / © 2019 IBM Corporation 40