Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud

•

2 gefällt mir•3,015 views

SingleStore

Daten & Analysen

The Data Warehouse Blueprint for
ML, AI, and Hybrid Cloud
@garyorenstein @memsql
MemSQL 1

Today’s Talk
A Data Warehouse Blueprint for
• Machine Learning and Artiﬁcial Intelligence
• Hybrid Cloud
Live demonstration of machine learning in SQL
• K_means clustering
MemSQL 2

Demonstration Step 1
1. Launch cluster
2. Setup k_means functions with MemSQL extensibility
3. Load data
4. Train data
5. Gain insights
• important_tags.sql
• representative_channels.sql
MemSQL 3

The Real-Time Data Warehouse
for the front lines of your business
MemSQL 5

What is a real-time data warehouse?
Similar to an
“Operational Data Warehouse”
MemSQL 6

A Real-Time Data Warehouse
• Adds real-time to analytics
• Reduces latency and ETL
• Manages structured data, loaded continuously
• Supports real-time decisions with embedded analytics
• Serves as an operational data store
• Delivers low latency reporting with automated queries
MemSQL 7

MemSQL: A Real-Time Data Warehouse
Streaming, Live and Historical Data
Immediate Insights with SQL
Scalable and distributed
MemSQL 8

Sequel Pro Client and MemSQL Cluster
MemSQL 9

MemSQL
#1 Operational
Data Warehouse in
2016
MemSQL 12

MemSQL
Top
“non-megavendor”
Operational
Data Warehouse
in 2017
MemSQL 13

Digital Transformation
is data based
MemSQL 16

Digital Transformation
database
MemSQL 17

MemSQL is also a top ranked
database by Gartner
MemSQL 18

MemSQL
Top
“non-megavendor”
HTAP Database in
2016
MemSQL 19

What is the advantage of being in
both the data warehouse and
database magic quadrants?
MemSQL 20

...you can’t do AI without
machine learning. You also can’t
do machine learning without
analytics, and you can’t do analytics
without data infrastructure.
— Hilary Mason, Data Scientist
MemSQL 22

Demonstration Step 2 and 3
1. Launch cluster
2. Setup k_means functions with MemSQL extensibility
3. Load data
4. Train data
5. Gain insights
• important_tags.sql
• representative_channels.sql
MemSQL 23

Over a billion users
Almost 1/3 of all people on the
Internet
Every day those users watch a
billion hours of video, generating
billions of views.
MemSQL 26

Videos have tags
What can they tell us?
MemSQL 27

YouTube Tags Data Set
Channel, Video, Tag
(Gary’s Channel, GO Video 1, hi)
(Gary’s Channel, GO Video 1, hello)
(Gary’s Channel, GO Video 2, hello)
(Gary’s Channel, GO Video 2, blue)
“Tag” Vector for Gary’s Channel
(hi:1, hello:2, blue:1)
MemSQL 28

Now we can compare vectors and
calculate clusters with k-means
MemSQL 29

k-means clustering partitions
observations into k clusters
Each observation belongs to the
cluster with the nearest mean,
serving as a prototype of the cluster
MemSQL 30

K-means in MemSQL with Extensibility
create or replace procedure k_means(num_its bigint, num_centroids bigint)
as
begin
call initialize_centroids(num_centroids);
for i in 1 .. num_its loop
call k_means_iteration();
end loop;
end //
MemSQL 33

Demonstration Step 4 and 5
1. Launch cluster
2. Setup k_means functions with MemSQL extensibility
3. Load data
4. Train data
5. Gain insights
• important_tags.sql
• representative_channels.sql
MemSQL 34

Steps 4 and 5
Train and Gain Insights
!MemSQL 35

important_tags.sql
select centroid_id, field_ids.field_id, importance, rn
from
(
select centroids.centroid_id,
centroids.field_id,
centroids.val - centroid_sums.val importance,
row_number() over (partition by centroids.centroid_id order by centroids.val - centroid_sums.val desc) rn
from centroids
join
(
select field_id, sum(val) / (select count(distinct centroid_id) from centroids) as val
from centroids
group by field_id
) centroid_sums
on centroids.field_id = centroid_sums.field_id
) centroids
join field_ids
on centroids.field_id = field_ids.id
where rn < 10
order by centroid_id, rn;
MemSQL 36

Check out our
book!
memsql.com/oreillyml
MemSQL 44

Thank you!
Visit us at the MemSQL
Booth (behind you)
Grab a tshirt!
Chat with engineers
See more tech demos
@garyorenstein @memsql
MemSQL 45

Empfohlen

Image Recognition on Streaming DataSingleStore

How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore

An Engineering Approach to Database EvaluationsSingleStore

Curriculum Associates Strata NYC 2017Kristi Lewandowski

MemSQL 201: Advanced Tips and Tricks WebcastSingleStore

Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSingleStore

Introduction to MemSQLSingleStore

Gartner Catalyst 2017: Image Recognition on Streaming DataSingleStore

Empfohlen

Image Recognition on Streaming DataSingleStore

How Database Convergence Impacts the Coming Decades of Data ManagementSingleStore

An Engineering Approach to Database EvaluationsSingleStore

Curriculum Associates Strata NYC 2017Kristi Lewandowski

MemSQL 201: Advanced Tips and Tricks WebcastSingleStore

Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSingleStore

Introduction to MemSQLSingleStore

Gartner Catalyst 2017: Image Recognition on Streaming DataSingleStore

From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsSingleStore

Journey to the Real-Time Analytics in Extreme GrowthSingleStore

Architecting Data in the AWS EcosystemSingleStore

How Kafka and Modern Databases Benefit Apps and AnalyticsSingleStore

See who is using MemSQLjenjermain

Real-Time Analytics with Spark and MemSQLSingleStore

Real-Time Analytics with Confluent and MemSQLSingleStore

Five ways database modernization simplifies your data lifeSingleStore

Bringing olap fully online analyze changing datasets in mem sql and spark wi...SingleStore

Internet of Things and Multi-model Data InfrastructureSingleStore

Introducing MemSQL 4SingleStore

Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentKinetica

Real-Time Geospatial Intelligence at Scale SingleStore

Intro to databricks delta lakeMykola Zerniuk

BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBig Data Week

MemSQL - The Real-time Analytics PlatformSingleStore

Getting It Right Exactly Once: Principles for Streaming ArchitecturesSingleStore

GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...Kinetica

Teaching Databases to Learn in the World of AISingleStore

Delta lake and the delta architectureAdam Doyle

The Fast Path to Building Operational Applications with SparkSingleStore

Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Continuent

Weitere ähnliche Inhalte

Was ist angesagt?

From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsSingleStore

Journey to the Real-Time Analytics in Extreme GrowthSingleStore

Architecting Data in the AWS EcosystemSingleStore

How Kafka and Modern Databases Benefit Apps and AnalyticsSingleStore

See who is using MemSQLjenjermain

Real-Time Analytics with Spark and MemSQLSingleStore

Real-Time Analytics with Confluent and MemSQLSingleStore

Five ways database modernization simplifies your data lifeSingleStore

Bringing olap fully online analyze changing datasets in mem sql and spark wi...SingleStore

Internet of Things and Multi-model Data InfrastructureSingleStore

Introducing MemSQL 4SingleStore

Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentKinetica

Real-Time Geospatial Intelligence at Scale SingleStore

Intro to databricks delta lakeMykola Zerniuk

BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBig Data Week

MemSQL - The Real-time Analytics PlatformSingleStore

Getting It Right Exactly Once: Principles for Streaming ArchitecturesSingleStore

GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...Kinetica

Teaching Databases to Learn in the World of AISingleStore

Delta lake and the delta architectureAdam Doyle

Was ist angesagt? (20)

From Spark to Ignition: Fueling Your Business on Real-Time Analytics

Journey to the Real-Time Analytics in Extreme Growth

Architecting Data in the AWS Ecosystem

How Kafka and Modern Databases Benefit Apps and Analytics

See who is using MemSQL

Real-Time Analytics with Spark and MemSQL

Real-Time Analytics with Confluent and MemSQL

Five ways database modernization simplifies your data life

Bringing olap fully online analyze changing datasets in mem sql and spark wi...

Internet of Things and Multi-model Data Infrastructure

Introducing MemSQL 4

Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent

Real-Time Geospatial Intelligence at Scale

Intro to databricks delta lake

BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform

MemSQL - The Real-time Analytics Platform

Getting It Right Exactly Once: Principles for Streaming Architectures

GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...

Teaching Databases to Learn in the World of AI

Delta lake and the delta architecture

Ähnlich wie Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud

The Fast Path to Building Operational Applications with SparkSingleStore

Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Continuent

Bandwidth: Use Cases for Elastic Cloud on Kubernetes Elasticsearch

MySQL ClusterMario Beck

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray

Noha mega storeNoha Elprince

WHITE PAPER▶ Protecting Microsoft SQL with Backup Exec 15Symantec

Driving Digital Transformation With Containers And Kubernetes Complete DeckSlideTeam

20090425mysqlslides 12593434194072-phpapp02Vinamra Mittal

Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsSingleStore

AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...Amazon Web Services

Megha_Osi my sql productroadmapOpenSourceIndia

Dagster - DataOps and MLOps for Machine Learning Engineers.pdfHong Ong

Delivering High Availability and Performance with SQL Server 2014 (Silviu Nic...ITCamp

Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Continuent

Webinar Slides: Real-Time Analytics from MySQLContinuent

IBM Cognos 10.2 Dynamic Cubes Deeper DiveSenturus

Data & Analytics Forum: Moving Telcos to Real TimeSingleStore

CeiloscaFabio Giannetti

OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs

Ähnlich wie Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud (20)

The Fast Path to Building Operational Applications with Spark

Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...

Bandwidth: Use Cases for Elastic Cloud on Kubernetes

MySQL Cluster

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...

Noha mega store

WHITE PAPER▶ Protecting Microsoft SQL with Backup Exec 15

Driving Digital Transformation With Containers And Kubernetes Complete Deck

20090425mysqlslides 12593434194072-phpapp02

Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics

AWS re:Invent 2016: Case Study: How Videology and Zendesk Modernized Their Bi...

Megha_Osi my sql productroadmap

Dagster - DataOps and MLOps for Machine Learning Engineers.pdf

Delivering High Availability and Performance with SQL Server 2014 (Silviu Nic...

Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...

Webinar Slides: Real-Time Analytics from MySQL

IBM Cognos 10.2 Dynamic Cubes Deeper Dive

Data & Analytics Forum: Moving Telcos to Real Time

Ceilosca

OS for AI: Elastic Microservices & the Next Gen of ML

Mehr von SingleStore

Building the Foundation for a Latency-Free LifeSingleStore

Converging Database Transactions and Analytics SingleStore

Building a Machine Learning Recommendation Engine in SQLSingleStore

Building a Fault Tolerant Distributed ArchitectureSingleStore

Stream Processing with Pipelines and Stored ProceduresSingleStore

Curriculum Associates Strata NYC 2017SingleStore

Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSingleStore

The State of the Data Warehouse in 2017 and BeyondSingleStore

Real-Time Analytics at Uber ScaleSingleStore

Machines and the Magic of Fast LearningSingleStore

Machines and the Magic of Fast Learning - Strata KeynoteSingleStore

Enabling Real-Time Analytics for IoTSingleStore

Driving the On-Demand Economy with Predictive AnalyticsSingleStore

Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore

The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsSingleStore

Enabling Real-Time Analytics for IoTSingleStore

Driving the On-Demand Economy with Predictive AnalyticsSingleStore

Building an IoT Kafka Pipeline in Under 5 MinutesSingleStore

Mehr von SingleStore (18)

Building the Foundation for a Latency-Free Life

Converging Database Transactions and Analytics

Building a Machine Learning Recommendation Engine in SQL

Building a Fault Tolerant Distributed Architecture

Stream Processing with Pipelines and Stored Procedures

Curriculum Associates Strata NYC 2017

Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition

The State of the Data Warehouse in 2017 and Beyond

Real-Time Analytics at Uber Scale

Machines and the Magic of Fast Learning

Machines and the Magic of Fast Learning - Strata Keynote

Enabling Real-Time Analytics for IoT

Driving the On-Demand Economy with Predictive Analytics

Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising

The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics

Enabling Real-Time Analytics for IoT

Driving the On-Demand Economy with Predictive Analytics

Building an IoT Kafka Pipeline in Under 5 Minutes

Kürzlich hochgeladen

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Week-01-2.ppt BBB human Computer interactionfulawalesam

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Midocean dropshipping via API with DroFxolyaivanovalion

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Invezz.com - Grow your wealth with trading signalsInvezz1

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

April 2024 - Crypto Market Report's Analysismanisha194592

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Kürzlich hochgeladen (20)

Determinants of health, dimensions of health, positive health and spectrum of...

Schema on read is obsolete. Welcome metaprogramming..pdf

Ravak dropshipping via API with DroFx.pptx

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Week-01-2.ppt BBB human Computer interaction

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Midocean dropshipping via API with DroFx

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Invezz.com - Grow your wealth with trading signals

Smarteg dropshipping via API with DroFx.pptx

April 2024 - Crypto Market Report's Analysis

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

BabyOno dropshipping via API with DroFx.pptx

Edukaciniai dropshipping via API with DroFx

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

BigBuy dropshipping via API with DroFx.pptx

Gartner Catalyst 2017: The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud

1. The Data Warehouse Blueprint for ML, AI, and Hybrid Cloud @garyorenstein @memsql MemSQL 1

2. Today’s Talk A Data Warehouse Blueprint for • Machine Learning and Artiﬁcial Intelligence • Hybrid Cloud Live demonstration of machine learning in SQL • K_means clustering MemSQL 2

3. Demonstration Step 1 1. Launch cluster 2. Setup k_means functions with MemSQL extensibility 3. Load data 4. Train data 5. Gain insights • important_tags.sql • representative_channels.sql MemSQL 3

4. Step 1 Launch Cluster !MemSQL 4

5. The Real-Time Data Warehouse for the front lines of your business MemSQL 5

6. What is a real-time data warehouse? Similar to an “Operational Data Warehouse” MemSQL 6

7. A Real-Time Data Warehouse • Adds real-time to analytics • Reduces latency and ETL • Manages structured data, loaded continuously • Supports real-time decisions with embedded analytics • Serves as an operational data store • Delivers low latency reporting with automated queries MemSQL 7

8. MemSQL: A Real-Time Data Warehouse Streaming, Live and Historical Data Immediate Insights with SQL Scalable and distributed MemSQL 8

9. Sequel Pro Client and MemSQL Cluster MemSQL 9

10. MemSQL 10

11. Perspective MemSQL 11

12. MemSQL #1 Operational Data Warehouse in 2016 MemSQL 12

13. MemSQL Top “non-megavendor” Operational Data Warehouse in 2017 MemSQL 13

14. MemSQL 14

15. MemSQL 15

16. Digital Transformation is data based MemSQL 16

17. Digital Transformation database MemSQL 17

18. MemSQL is also a top ranked database by Gartner MemSQL 18

19. MemSQL Top “non-megavendor” HTAP Database in 2016 MemSQL 19

20. What is the advantage of being in both the data warehouse and database magic quadrants? MemSQL 20

21. INSERT UPDATE DELETE MemSQL 21

22. ...you can’t do AI without machine learning. You also can’t do machine learning without analytics, and you can’t do analytics without data infrastructure. — Hilary Mason, Data Scientist MemSQL 22

23. Demonstration Step 2 and 3 1. Launch cluster 2. Setup k_means functions with MemSQL extensibility 3. Load data 4. Train data 5. Gain insights • important_tags.sql • representative_channels.sql MemSQL 23

24. Step 2 and 3 Setup and Load !MemSQL 24

25. MemSQL 25

26. Over a billion users Almost 1/3 of all people on the Internet Every day those users watch a billion hours of video, generating billions of views. MemSQL 26

27. Videos have tags What can they tell us? MemSQL 27

28. YouTube Tags Data Set Channel, Video, Tag (Gary’s Channel, GO Video 1, hi) (Gary’s Channel, GO Video 1, hello) (Gary’s Channel, GO Video 2, hello) (Gary’s Channel, GO Video 2, blue) “Tag” Vector for Gary’s Channel (hi:1, hello:2, blue:1) MemSQL 28

29. Now we can compare vectors and calculate clusters with k-means MemSQL 29

30. k-means clustering partitions observations into k clusters Each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster MemSQL 30

31. MemSQL 31

32. MemSQL 32

33. K-means in MemSQL with Extensibility create or replace procedure k_means(num_its bigint, num_centroids bigint) as begin call initialize_centroids(num_centroids); for i in 1 .. num_its loop call k_means_iteration(); end loop; end // MemSQL 33

34. Demonstration Step 4 and 5 1. Launch cluster 2. Setup k_means functions with MemSQL extensibility 3. Load data 4. Train data 5. Gain insights • important_tags.sql • representative_channels.sql MemSQL 34

35. Steps 4 and 5 Train and Gain Insights !MemSQL 35

36. important_tags.sql select centroid_id, field_ids.field_id, importance, rn from ( select centroids.centroid_id, centroids.field_id, centroids.val - centroid_sums.val importance, row_number() over (partition by centroids.centroid_id order by centroids.val - centroid_sums.val desc) rn from centroids join ( select field_id, sum(val) / (select count(distinct centroid_id) from centroids) as val from centroids group by field_id ) centroid_sums on centroids.field_id = centroid_sums.field_id ) centroids join field_ids on centroids.field_id = field_ids.id where rn < 10 order by centroid_id, rn; MemSQL 36

37. k_means results MemSQL 37

38. MemSQL 38

39. representative channels MemSQL 39

40. A bit about Hybrid Cloud MemSQL 40

41. MemSQL 41

42. MemSQL 42

43. MemSQL 43

44. Check out our book! memsql.com/oreillyml MemSQL 44

45. Thank you! Visit us at the MemSQL Booth (behind you) Grab a tshirt! Chat with engineers See more tech demos @garyorenstein @memsql MemSQL 45