SlideShare ist ein Scribd-Unternehmen logo
1 von 28
IBM Cloud Query
Introduction and Roadmap
Session 1480
Torsten Steinbach, IBM Cloud Architect
@torsstei
Chris Glew, IBM Cloud Offering Manager
Innovation in Big Data Analytics
Enterprise Data
Warehouses
Tightly integrated and
optimized systems
Hadoop
Introduced open data formats &
easy scaling on commodity HW
Serverless Analytics-aaS
• Seamless elasticity
• Pay-per-query consumption
• Analyze data as it sits in an object store
• Disaggregated architecture
• No more infrastructure head aches
The 90-ies 2000 Today
IBM Cloud Query
Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
• Server-less ANSI SQL queries of open
data formats on cloud object storage
• Pay per query
• Free of charge beta
now available to
everyone via IBM
Cloud catalog
Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
• Provision & run your first query in < 1 minute
• Put your data in cloud object storage and immediately query
it
• No database load
• Dynamic schema
inference
• Turns cold storage
into live workspace
for big data analytics
IBM Cloud Query
IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)
2. Read data
4. Read
results
Application
3. Write results
IBM Cloud
Object Storage
Result
Set
Data Set
Data Set
Data Set
1. Submit SQL
SQL
Archive / Export
IBM Cloud Streaming
IBM Streams
Message Hub
Land
Query
Watson IoT
IBM Cloud Query Architecture
IBM Cloud Databases
Db2 on
Cloud
SQL REST
API
SQL Query Usage
Create
Query
SQL Web Console
Watson
Studio
Notebooks
SQL Cloud Function
Integrate Explore
Deploy
IBM Cloud Query – Access
Patterns
Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Introducing Query Web Console
Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Reference source data in
Cloud Object Storage
Location in Cloud Object
Storage for the result set
Details of the
SQL execution
Table Locators
Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
cos://<endpoint>/<bucket>/[<prefix>]
Endpoint – of your object storage bucket or a short alias
E.g. s3.us-south.objectstorage.softlayer.net or us-south
Bucket – name in object storage
Prefix – one or multiple objects (e.g., table partitions) with same prefix
Used in FROM clauses for input data and in target field for result set data
Examples:
cos://us-south/myBucket/myFolder/mySubFolder/myData.parquet
cos://us-geo/otherBucket/myData
cos://us-geo/otherBucket/myData/part
cos://eu-geo/newBucket/
Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
<Table Locator> [STORED AS CSV | PARQUET | JSON]
• Specifies the data format of the input data
• Table schema is automatically inferred at SQL execution time
• Clause is optional, the default is CSV
Table Formats
Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Submit a SQL query
POST https://sql-api.ng.bluemix.net/v2-beta/sql_jobs
Runs the SQL in the background and returns a job_id
Detailed info for a SQL query (e.g. status, result location)
GET https://sql-api.ng.bluemix.net/v2-
beta/sql_jobs/{job_id}
Returns JSON with query execution details
List of recent SQL query executions
GET https://sql-api.ng.bluemix.net/v2-beta/sql_jobs
Returns JSON array with last 30 SQL submissions and outcomes
IBM Cloud Query REST API
Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Open source Python client
!pip install ibmcloudsql
Convenient programmatic access
• You only provide: API key, SQL query and location URL for SQL result
• Result set written to object storage and returned as pandas data frame
• Useful methods for SQL job status & SQL history
Use Watson Studio Notebook with Python kernel
• Interactive SQL submission
and result visualization
using PixieDust widgets
IBM Cloud Query in Watson
Studio
Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
IBM Cloud Functions: IBM’s function-aaS for running event-based
custom logic in any language
SQL Query adds scale-out data processing functions
Server-less + Server-less = Server-lesser
Example: automated data processing pipelines
bx wsk action create mysql --docker ibmfunctions/sqlquery
+ Bind parameters for SQL statement text and result target location
Server-less & scale-out (Spark)
SQL Execution Service
Server-less & event-driven function
execution & orchestration
Cloud Function
IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)
Logs
Your Cloud
Application/Solution
IBM Cloud Object Storage
Use for analyzing application logs
Query
Transform
Compress
Aggregate
Repartition
Analyze
Anomaly Detection
User Segmentation
Customer Support
Resource Planning
• Build & run data pipelines and analytics of your log message data
• Flexible log data analytics with full power of SQL
• Seamless scalability & elasticity according to your log message volume
IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)Use to explore and preprocess data for BI
IBM Cloud Object Storage
Acquire
Query
Data Warehouses &
Databases
Db2 on
Cloud
Process Report
ApplicationsApplications
Applications
IoT
Streaming
Devices
Devices
Devices
BI Reporting
Land
Promote
Cleanse
Filter
Merge
Aggregate
Compress
Read
Watson Studio
Looker
Cognos
Tableau
Explore
IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)
Application
IBM Cloud
Object Storage
Result
Set
Data Set
Data Set
Data Set
Data Skipping
Geospatial
SQL
SQL
Read
Write
Table Meta Data
Query
IBM Cloud Query – Strategic
Architecture
IBM Cloud Databases
Db2 on
Cloud
Read
Write
Watson Knowledge
Catalog
SQL queries
meta datadata sets
Read
Register
IBM Cloud Streaming
IBM Streams
Message Hub
Watson IoT
IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)Roadmap: for Location Analytics on IoT Data
IBM Cloud Object Storage
Locatio
n Data
Query
Location
Analytics
Mobile
Cars
Devices
Land
Location
Filtering
Spatial
Aggregation
GPS
SQL/MM
Location data is a native SQL data
type
• Points (e.g. current location)
• Lines (e.g. GPS track, road)
• Polygons (e.g. zip code area)
Native SQL functions to accurately
aggregate, filter and join based on
location
Full geodesic globe support. No
projection incorrectness
• E.g. well suited for oil & gas data in polar
regions
Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Roadmap: Data Skipping Index for
huge potential cost and time savings
17Think 2018 / March 21, 2018 / Š 2018 IBM Corporation
Cloud Object
Storage
SQL
Query
Analytics and storage are independent micro
services
Critical cost factors include bytes shipped and
number of REST calls
Reduce with a metadata index for objects in a data set
Enable Spark SQL to query metadata index to
determine which objects are not relevant to a query
Example: Detect SLA violation on log messages
SELECT acct, count(CASE WHEN status > 499…) AS err_cnt,
WHERE acct IN (‘3c04affe-… -bluemix’)
FROM cos://…-logs… /february STORED AS parquet
GROUP BY acct, dt, hour(timegen), minute(timegen)
...
Default CSV: 41x faster with data skipping*
Optimized Parquet: 6x faster with data skipping*
*Results are data and query dependent.
Data Skipping Architecture Flow
Spark
Worker
dataset
obj1
Spark
Driver
obj2 obj3
Spark
WorkerSpark
Worker
Create
Index
Data Skipping Architecture Flow
Spark
Worker
dataset
obj1
Spark
Driver
obj2 obj3
Spark
WorkerSpark
Worker
Create
Index
Object metadata
Indexing
meta
Create Index
Data Skipping Architecture Flow
Spark
Worker
dataset
obj1
Spark
Driver
obj2 obj3
Object metadata
Indexing
meta
Spark
WorkerSpark
Worker
query
Data Skipping – Create Index
21Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Index Statistics Before Usage
22Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Using the Index
23Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Index Statistics After Usage
24Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Thank you!
25Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Torsten Steinbach
torsten@de.ibm.com
Chris Glew
cglew@us.ibm.com
Please note
IBM’s statements regarding its plans, directions, and intent are subject to change
or withdrawal without notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment,
promise, or legal obligation to deliver any material, code or functionality. Information about
potential future products may not be incorporated into any contract.
The development, release, and timing of any future features or functionality described for
our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in
a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the
amount of multiprogramming in the user’s job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an
individual user will achieve results similar to those stated here.
26
Notices and disclaimers
27Think 2018 / January 12, 2018 / Š 2018 IBM Corporation
Š 2018 International Business Machines Corporation. No part of this
document may be reproduced or transmitted in any form without
written permission from IBM.
U.S. Government Users Restricted Rights — use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to
products that have not yet been announced by IBM) has been reviewed
for accuracy as of the date of initial publication and could include
unintentional technical or typographical errors. IBM shall have no
responsibility to update this information. This document is distributed
“as is” without any warranty, either express or implied. In no event,
shall IBM be liable for any damage arising from the use of this
information, including but not limited to, loss of data, business
interruption, loss of profit or loss of opportunity. IBM products and
services are warranted per the terms and conditions of the agreements
under which they are provided.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously
installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product
plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a
controlled, isolated environments. Customer examples are presented as
illustrations of how those
customers have used IBM products and the results they may have
achieved. Actual performance, cost, savings or other results in other
operating environments may vary.
References in this document to IBM products, programs, or services
does not imply that IBM intends to make such products, programs or
services available in all countries in which IBM operates or does
business.
Workshops, sessions and associated materials may have been prepared
by independent session speakers, and do not necessarily reflect the
views of IBM. All materials and discussions are provided for informational
purposes only, and are neither intended to, nor shall constitute legal or
other guidance or advice to any individual participant or their specific
situation.
It is the customer’s responsibility to insure its own compliance with legal
requirements and to obtain advice of competent legal counsel as to
the identification and interpretation of any relevant laws and regulatory
requirements that may affect the customer’s business and any actions
the customer may need to take to comply with such laws. IBM does not
provide legal advice or represent or warrant that its services or products
will ensure that the customer follows any law.
Notices and disclaimers
continued
28Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
Information concerning non-IBM products was obtained from the
suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products about this
publication and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBM products. Questions
on the capabilities of non-IBM products should be addressed to the
suppliers of those products. IBM does not warrant the quality of any
third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM expressly disclaims all
warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com and [names of other referenced IBM
products and services used in the presentation] are trademarks of
International Business Machines Corporation, registered in many
jurisdictions worldwide. Other product and service names might
be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark
information" at: www.ibm.com/legal/copytrade.shtml.
.

Weitere ähnliche Inhalte

Was ist angesagt?

Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricks
Brandon Berlinrut
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
DataWorks Summit
 
Highly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMaticHighly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMatic
DataWorks Summit
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
DataWorks Summit
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
DataWorks Summit
 

Was ist angesagt? (20)

#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data Services
 
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricks
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
 
Highly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMaticHighly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMatic
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Why Use an Oracle Database?
Why Use an Oracle Database?Why Use an Oracle Database?
Why Use an Oracle Database?
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
 

Ähnlich wie IBM THINK 2018 - IBM Cloud SQL Query Introduction

Ähnlich wie IBM THINK 2018 - IBM Cloud SQL Query Introduction (20)

IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
 
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudIBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM Cloud
 
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudIBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM Cloud
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
Coud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AICoud-based Data Lake for Analytics and AI
Coud-based Data Lake for Analytics and AI
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...
Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...
Set Up a Million-Core Cluster to Accelerate HPC Workloads (CMP404) - AWS re:I...
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 

Mehr von Torsten Steinbach

esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
Torsten Steinbach
 

Mehr von Torsten Steinbach (9)

Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
IBM Insight 2014 - Advanced Warehouse Analytics in the CloudIBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud
 
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudIBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloud
 
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisIBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
 
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...
 
IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System fo...
IBM Information on Demand 2013  - Session 2839 - Using IBM PureData System fo...IBM Information on Demand 2013  - Session 2839 - Using IBM PureData System fo...
IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System fo...
 
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
esri2015cloudantdashdbpresentation-150731203041-lva1-app6892
 

KĂźrzlich hochgeladen

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
SUHANI PANDEY
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

KĂźrzlich hochgeladen (20)

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 

IBM THINK 2018 - IBM Cloud SQL Query Introduction

  • 1. IBM Cloud Query Introduction and Roadmap Session 1480 Torsten Steinbach, IBM Cloud Architect @torsstei Chris Glew, IBM Cloud Offering Manager
  • 2. Innovation in Big Data Analytics Enterprise Data Warehouses Tightly integrated and optimized systems Hadoop Introduced open data formats & easy scaling on commodity HW Serverless Analytics-aaS • Seamless elasticity • Pay-per-query consumption • Analyze data as it sits in an object store • Disaggregated architecture • No more infrastructure head aches The 90-ies 2000 Today
  • 3. IBM Cloud Query Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation • Server-less ANSI SQL queries of open data formats on cloud object storage • Pay per query • Free of charge beta now available to everyone via IBM Cloud catalog
  • 4. Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation • Provision & run your first query in < 1 minute • Put your data in cloud object storage and immediately query it • No database load • Dynamic schema inference • Turns cold storage into live workspace for big data analytics IBM Cloud Query
  • 5. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018) 2. Read data 4. Read results Application 3. Write results IBM Cloud Object Storage Result Set Data Set Data Set Data Set 1. Submit SQL SQL Archive / Export IBM Cloud Streaming IBM Streams Message Hub Land Query Watson IoT IBM Cloud Query Architecture IBM Cloud Databases Db2 on Cloud
  • 6. SQL REST API SQL Query Usage Create Query SQL Web Console Watson Studio Notebooks SQL Cloud Function Integrate Explore Deploy IBM Cloud Query – Access Patterns Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
  • 7. Introducing Query Web Console Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation Reference source data in Cloud Object Storage Location in Cloud Object Storage for the result set Details of the SQL execution
  • 8. Table Locators Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation cos://<endpoint>/<bucket>/[<prefix>] Endpoint – of your object storage bucket or a short alias E.g. s3.us-south.objectstorage.softlayer.net or us-south Bucket – name in object storage Prefix – one or multiple objects (e.g., table partitions) with same prefix Used in FROM clauses for input data and in target field for result set data Examples: cos://us-south/myBucket/myFolder/mySubFolder/myData.parquet cos://us-geo/otherBucket/myData cos://us-geo/otherBucket/myData/part cos://eu-geo/newBucket/
  • 9. Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation <Table Locator> [STORED AS CSV | PARQUET | JSON] • Specifies the data format of the input data • Table schema is automatically inferred at SQL execution time • Clause is optional, the default is CSV Table Formats
  • 10. Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation Submit a SQL query POST https://sql-api.ng.bluemix.net/v2-beta/sql_jobs Runs the SQL in the background and returns a job_id Detailed info for a SQL query (e.g. status, result location) GET https://sql-api.ng.bluemix.net/v2- beta/sql_jobs/{job_id} Returns JSON with query execution details List of recent SQL query executions GET https://sql-api.ng.bluemix.net/v2-beta/sql_jobs Returns JSON array with last 30 SQL submissions and outcomes IBM Cloud Query REST API
  • 11. Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation Open source Python client !pip install ibmcloudsql Convenient programmatic access • You only provide: API key, SQL query and location URL for SQL result • Result set written to object storage and returned as pandas data frame • Useful methods for SQL job status & SQL history Use Watson Studio Notebook with Python kernel • Interactive SQL submission and result visualization using PixieDust widgets IBM Cloud Query in Watson Studio
  • 12. Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation IBM Cloud Functions: IBM’s function-aaS for running event-based custom logic in any language SQL Query adds scale-out data processing functions Server-less + Server-less = Server-lesser Example: automated data processing pipelines bx wsk action create mysql --docker ibmfunctions/sqlquery + Bind parameters for SQL statement text and result target location Server-less & scale-out (Spark) SQL Execution Service Server-less & event-driven function execution & orchestration Cloud Function
  • 13. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018) Logs Your Cloud Application/Solution IBM Cloud Object Storage Use for analyzing application logs Query Transform Compress Aggregate Repartition Analyze Anomaly Detection User Segmentation Customer Support Resource Planning • Build & run data pipelines and analytics of your log message data • Flexible log data analytics with full power of SQL • Seamless scalability & elasticity according to your log message volume
  • 14. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)Use to explore and preprocess data for BI IBM Cloud Object Storage Acquire Query Data Warehouses & Databases Db2 on Cloud Process Report ApplicationsApplications Applications IoT Streaming Devices Devices Devices BI Reporting Land Promote Cleanse Filter Merge Aggregate Compress Read Watson Studio Looker Cognos Tableau Explore
  • 15. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018) Application IBM Cloud Object Storage Result Set Data Set Data Set Data Set Data Skipping Geospatial SQL SQL Read Write Table Meta Data Query IBM Cloud Query – Strategic Architecture IBM Cloud Databases Db2 on Cloud Read Write Watson Knowledge Catalog SQL queries meta datadata sets Read Register IBM Cloud Streaming IBM Streams Message Hub Watson IoT
  • 16. IBM Cloud SQL Query – Very High Level Architecture (MVP 1Q 2018)Roadmap: for Location Analytics on IoT Data IBM Cloud Object Storage Locatio n Data Query Location Analytics Mobile Cars Devices Land Location Filtering Spatial Aggregation GPS SQL/MM Location data is a native SQL data type • Points (e.g. current location) • Lines (e.g. GPS track, road) • Polygons (e.g. zip code area) Native SQL functions to accurately aggregate, filter and join based on location Full geodesic globe support. No projection incorrectness • E.g. well suited for oil & gas data in polar regions Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
  • 17. Roadmap: Data Skipping Index for huge potential cost and time savings 17Think 2018 / March 21, 2018 / Š 2018 IBM Corporation Cloud Object Storage SQL Query Analytics and storage are independent micro services Critical cost factors include bytes shipped and number of REST calls Reduce with a metadata index for objects in a data set Enable Spark SQL to query metadata index to determine which objects are not relevant to a query Example: Detect SLA violation on log messages SELECT acct, count(CASE WHEN status > 499…) AS err_cnt, WHERE acct IN (‘3c04affe-… -bluemix’) FROM cos://…-logs… /february STORED AS parquet GROUP BY acct, dt, hour(timegen), minute(timegen) ... Default CSV: 41x faster with data skipping* Optimized Parquet: 6x faster with data skipping* *Results are data and query dependent.
  • 18. Data Skipping Architecture Flow Spark Worker dataset obj1 Spark Driver obj2 obj3 Spark WorkerSpark Worker Create Index
  • 19. Data Skipping Architecture Flow Spark Worker dataset obj1 Spark Driver obj2 obj3 Spark WorkerSpark Worker Create Index Object metadata Indexing meta Create Index
  • 20. Data Skipping Architecture Flow Spark Worker dataset obj1 Spark Driver obj2 obj3 Object metadata Indexing meta Spark WorkerSpark Worker query
  • 21. Data Skipping – Create Index 21Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
  • 22. Index Statistics Before Usage 22Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
  • 23. Using the Index 23Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
  • 24. Index Statistics After Usage 24Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation
  • 25. Thank you! 25Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation Torsten Steinbach torsten@de.ibm.com Chris Glew cglew@us.ibm.com
  • 26. Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 26
  • 27. Notices and disclaimers 27Think 2018 / January 12, 2018 / Š 2018 IBM Corporation Š 2018 International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer follows any law.
  • 28. Notices and disclaimers continued 28Think 2018 / DOC ID / Month XX, 2018 / Š 2018 IBM Corporation Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com and [names of other referenced IBM products and services used in the presentation] are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml. .