SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Using SQL-MapReduce for Advanced Analytical Queries by Rick F. van der LansR20/Consultancy BV
What Did the Users Want? BI reports Production databases
But What Did We Create? ODS data warehouse datamart production database cube
Problems with Current DW Platforms 45% 40% 39% 37% 33% 29% 23% 23% 21% 20% 19% 16% 16% 15% 14% 13% 11% 4% 3% Poor query response Can’t support advanced analytics Inadequate data load speed Can’t scale to large data volumes Cost of scaling up is too expensive Poorly suited to real-time or on demand workloads Current platform is a legacy we must phase out Can’t support data modeling we need We need platform that supports mixed workloads Can’t support large concurrent user count Inadequate high availability Inadequate support for in-memory processing Inadequate support for web services and SOA Current platform is 32-bit, and we need 64-bit Current platform is SMP, and we need MPP We need platform better suited to cloud or virtualization Can’t secure the data properly Other No problems Source: P. Russom, ‘Next Generation Data Warehouse Platforms’, TDWI Best Practices Report, fourth quarter 2009.
49% 8% 20% 12% 8% 1% 1% 3% current DW platform 2009 2010 2011 2012 2013 2014 2015 or later Need for More Powerful Data Warehouse Platforms no plans to replace Source: P. Russom, ‘Next Generation Data Warehouse Platforms’, TDWI Best Practices Report, fourth quarter 2009.
New Forms of Analytics Advanced Analytics Operational Analytics Deep Analytics Self-Service Analytics Complex Analytics Automated Analytics
Positioning of Advanced Analytics complexity of analytical queries high complex queries on small to medium size  databases advanced analytics simple queries on small to medium size  databases simple queries on large to ultra large  databases low database size low high
Parallellization of SQL Worker Worker Worker SELECT   * FROM     CUSTOMERS WHERE    LOCATION = 'New York' Database Server Master
How Easy Is Parallelizing SQL Queries? (1) Example  1: SELECT   ID, SALES_DATE, PRICE FROM     SALES_RECORDS WHERE    PRICE > 100 Example 2: SELECT   REGION_ID, SUM(PRICE) FROM     SALES_RECORDS WHERE    PRICE > 100 GROUP BY REGION_ID
How Easy Is Parallelizing SQL Queries? (2) Example 3:  Get all the flights to London  for which another flight  exists to London that leaves  within an hour on the same  day. SELECT   * FROM     DEPARTURES AS D1 WHERE    DESTINATION = 'London' AND      DEPARTURE_TIME + 60 MINUTES >=         (SELECT   MIN(DEPARTURE_TIME)          FROM     DEPARTURES AS D2          WHERE    DESTINATION = 'London'          AND      D2.DEPARTURE_TIME > D1.DEPARTURE_TIME          AND      D2.DEPARTURE_DAY = D1.DEPARTURE_DAY) ORDER BY DEPARTURE_TIME
How Easy Is Parallelizing SQL Queries? (3) SELECTA.PROD_DESC AS ITEM1,B.PROD_DESC AS ITEM2,C.PROD_DESC AS ITEM3,COUNT (*) AS CNTFROM(SELECT SF.STORE_ID, SF.REG_ID, SF.TRAN_NO, SF.ITEM_ID, SF.DT, PD.PROD_DESC, PD.PRICE       FROM             SALES_FACT SF       INNER JOIN             PRODUCT_DIM PD       WHERE             SF.ITEM_ID=PD.ITEM_ID) AS TRANSACTIONS A, (SELECT SF.STORE_ID, SF.REG_ID, SF.TRAN_NO, SF.ITEM_ID, SF.DT, PD.PROD_DESC, PD.PRICE       FROM             SALES_FACT SF       INNER JOIN             PRODUCT_DIM PD       WHERE             SF.ITEM_ID=PD.ITEM_ID) AS TRANSACTIONS B,(SELECT SF.STORE_ID, SF.REG_ID, SF.TRAN_NO, SF.ITEM_ID, SF.DT, PD.PROD_DESC, PD.PRICE       FROM             SALES_FACT SF       ,,,            PRODUCT_DIM PD       WHERE             SF.ITEM_ID=PD.ITEM_ID) AS TRANSACTIONS C WHERE A.STORE_ID=B.STORE_ID AND  B.STORE_ID=C.STORE_ID AND A.STORE_ID=C.STORE_ID AND A.REG_ID=B.REG_ID AND  B.REG_ID=C.REG_ID AND A.REG_ID=C.REG_ID AND A.TRAN_NO=B.TRAN_NO AND  B.TRAN_NO=C.TRAN_NO AND A.TRAN_NO=C.TRAN_NO AND A.DT=B.DT AND  B.DT=C.DT AND A.DT=C.DT AND A.ITEM_ID<>B.ITEM_ID AND A.ITEM_ID<>C.ITEM_ID AND B.ITEM_ID<>C.ITEM_IDGROUP BY A.PROD_DESC,  B.PROD_DESC,  C.PROD_DESCHAVING  COUNT(*)>1000ORDER BY COUNT(*) DESC; Example  4:  Market basket  analysis:
Declarativeness and Storage Independency Declarativeness: 	The developer has only to program what has to be done, and not how it should be done. Storage independency: 	The language should hide how data is physically stored and how it is accessed.
Advantages of Two Properties Productivity increase less code has to be written Maintainability:  less code means having to maintain less code Flexibility:  changes to the storage layer can be made without the need to change the SQL code in the reports
Different Types of SQL Functions Built-in or User-defined SELECT   FLIGHT, TRUNCATE(DEPARTURE_TIME, MINUTES) FROM     DEPARTURES AS D1 WHERE    BANK_HOLIDAY(DEPARTURE_TIME) = 1 Scalar or Table SELECT   AVG(DURATION) FROM     LAST_FIVE_ROWS(DEPARTURES) Pure SQL, Procedural, or External Simple or Complex
MapReduce MapReduce is a programming model introduced by Google Aimed at processing requests on large data sets where the processing can be distributed over a high number of nodes using parallel capabilities  Two steps Map and Reduce Map is like Select Reduce is like Group-by
Aster Data’s SQL-MapReduce (1) SQL-MR is a set of built-in and user-defined external table functions Example: SELECT   * FROM     GET_NEXT_FLIGHT_1HR          (ON DEPARTURES PARTITION BY DESTINATION) WHERE    DESTINATION = 'London' ORDER BY DEPARTURE_TIME All the SQL-MR function processing is parallelized Including complex group-by operations and time-series analytics
Aster Data’s SQL-MapReduce (2) An SQL-MR function can contain the most complex analytical logic Programmers of SQL don’t need to learn a new language, Java, C++, Python, and many more can be used The SQL statements invoking SQL-MR functions are still declarative and storage-independent The functions themselves are not Usable by any BI tools supporting SQL
Supported Built-in Functions
SQL-MR  Technical Advantages Technical Disadvantages ,[object Object]
Simplification of queries
Efficiency of low-level programming language
Efficient data access
Predictable query performance
Linear scalability
Built-in functions

Weitere ähnliche Inhalte

Andere mochten auch

Metalurgi kimpalan kimpalan rekabentuk dan proses pemilihan.
Metalurgi kimpalan kimpalan rekabentuk dan proses pemilihan.Metalurgi kimpalan kimpalan rekabentuk dan proses pemilihan.
Metalurgi kimpalan kimpalan rekabentuk dan proses pemilihan.Universiti Teknologi Malaysia
 
marketing segmentation(shoe)
marketing segmentation(shoe)marketing segmentation(shoe)
marketing segmentation(shoe)munirah38
 
10 Best Practices Of Software Product Management
10 Best Practices Of Software Product Management10 Best Practices Of Software Product Management
10 Best Practices Of Software Product ManagementSVPMA
 
Best Practices in Vendor Management, Strategic Sourcing, Procure to Pay and D...
Best Practices in Vendor Management, Strategic Sourcing, Procure to Pay and D...Best Practices in Vendor Management, Strategic Sourcing, Procure to Pay and D...
Best Practices in Vendor Management, Strategic Sourcing, Procure to Pay and D...SAP Ariba
 
Management information system software
Management information system softwareManagement information system software
Management information system softwareOnline
 
Dental luting cements/ colleges for dentistry
Dental luting cements/ colleges for dentistryDental luting cements/ colleges for dentistry
Dental luting cements/ colleges for dentistryIndian dental academy
 
Market mapping
Market mappingMarket mapping
Market mappingtutor2u
 
Program management - Fundamentals
Program management   - FundamentalsProgram management   - Fundamentals
Program management - FundamentalsJulen Mohanty
 
US and EU Submission – Comparative
US and EU Submission – ComparativeUS and EU Submission – Comparative
US and EU Submission – ComparativeGirish Swami
 
Managing demand and cpacity
Managing demand and cpacityManaging demand and cpacity
Managing demand and cpacityRbk Asr
 
Securing Single Page Applications with Token Based Authentication
Securing Single Page Applications with Token Based AuthenticationSecuring Single Page Applications with Token Based Authentication
Securing Single Page Applications with Token Based AuthenticationStefan Achtsnit
 

Andere mochten auch (14)

Metalurgi kimpalan kimpalan rekabentuk dan proses pemilihan.
Metalurgi kimpalan kimpalan rekabentuk dan proses pemilihan.Metalurgi kimpalan kimpalan rekabentuk dan proses pemilihan.
Metalurgi kimpalan kimpalan rekabentuk dan proses pemilihan.
 
marketing segmentation(shoe)
marketing segmentation(shoe)marketing segmentation(shoe)
marketing segmentation(shoe)
 
10 Best Practices Of Software Product Management
10 Best Practices Of Software Product Management10 Best Practices Of Software Product Management
10 Best Practices Of Software Product Management
 
Best Practices in Vendor Management, Strategic Sourcing, Procure to Pay and D...
Best Practices in Vendor Management, Strategic Sourcing, Procure to Pay and D...Best Practices in Vendor Management, Strategic Sourcing, Procure to Pay and D...
Best Practices in Vendor Management, Strategic Sourcing, Procure to Pay and D...
 
Management information system software
Management information system softwareManagement information system software
Management information system software
 
Dental luting cements/ colleges for dentistry
Dental luting cements/ colleges for dentistryDental luting cements/ colleges for dentistry
Dental luting cements/ colleges for dentistry
 
Market mapping
Market mappingMarket mapping
Market mapping
 
Managing Demand and Supply
Managing Demand and SupplyManaging Demand and Supply
Managing Demand and Supply
 
Program management - Fundamentals
Program management   - FundamentalsProgram management   - Fundamentals
Program management - Fundamentals
 
US and EU Submission – Comparative
US and EU Submission – ComparativeUS and EU Submission – Comparative
US and EU Submission – Comparative
 
Managing demand and cpacity
Managing demand and cpacityManaging demand and cpacity
Managing demand and cpacity
 
Underground survey
Underground surveyUnderground survey
Underground survey
 
Securing Single Page Applications with Token Based Authentication
Securing Single Page Applications with Token Based AuthenticationSecuring Single Page Applications with Token Based Authentication
Securing Single Page Applications with Token Based Authentication
 
Toyota talent
Toyota talentToyota talent
Toyota talent
 

Ähnlich wie SQL-MapReduce for Advanced Analytics and Simplified Architecture

Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsMongoDB
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Codemotion
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18Imply
 
Big Data-Driven Applications with Cassandra and Spark
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and SparkArtem Chebotko
 
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Keshav Murthy
 
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchRim Moussa
 
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...MongoDB
 
Visual studio 2008
Visual studio 2008Visual studio 2008
Visual studio 2008Luis Enrique
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo MondrianSimone Campora
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Advanced tips for making Oracle databases faster
Advanced tips for making Oracle databases fasterAdvanced tips for making Oracle databases faster
Advanced tips for making Oracle databases fasterSolarWinds
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaDatabricks
 
Databaseconcepts
DatabaseconceptsDatabaseconcepts
Databaseconceptsdilipkkr
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development Open Party
 
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)Ontico
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionChetan Khatri
 

Ähnlich wie SQL-MapReduce for Advanced Analytics and Simplified Architecture (20)

Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSs
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
 
What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18What’s New in Imply 3.3 & Apache Druid 0.18
What’s New in Imply 3.3 & Apache Druid 0.18
 
Big Data-Driven Applications with Cassandra and Spark
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and Spark
 
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.
 
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
 
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
 
Visual studio 2008
Visual studio 2008Visual studio 2008
Visual studio 2008
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo Mondrian
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Advanced tips for making Oracle databases faster
Advanced tips for making Oracle databases fasterAdvanced tips for making Oracle databases faster
Advanced tips for making Oracle databases faster
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu Ma
 
Databaseconcepts
DatabaseconceptsDatabaseconcepts
Databaseconcepts
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
 
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 

Mehr von Teradata Aster

Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingRazorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingTeradata Aster
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-MakingTeradata Aster
 
Using Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic EnvironmentUsing Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic EnvironmentTeradata Aster
 
Big Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey DataBig Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey DataTeradata Aster
 
What Makes A Great Data Scientist?
What Makes A Great Data Scientist?What Makes A Great Data Scientist?
What Makes A Great Data Scientist?Teradata Aster
 
Practical Applications of Visual Analytics
Practical Applications of Visual AnalyticsPractical Applications of Visual Analytics
Practical Applications of Visual AnalyticsTeradata Aster
 
Trust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social MediaTrust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social MediaTeradata Aster
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business AdvantageTeradata Aster
 
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s DilemmaBig Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s DilemmaTeradata Aster
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
 
Keynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball AnalyticsKeynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball AnalyticsTeradata Aster
 
Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics, Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics, Teradata Aster
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondTeradata Aster
 
From Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedFrom Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedTeradata Aster
 
Solving the Education Crisis with Big Data
Solving the Education Crisis with Big DataSolving the Education Crisis with Big Data
Solving the Education Crisis with Big DataTeradata Aster
 
SAS aster data big data dc presentation public
SAS aster data big data dc presentation publicSAS aster data big data dc presentation public
SAS aster data big data dc presentation publicTeradata Aster
 
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Teradata Aster
 
20100506 aster data big data summit - microstrategy (shareable)
20100506   aster data big data summit - microstrategy (shareable)20100506   aster data big data summit - microstrategy (shareable)
20100506 aster data big data summit - microstrategy (shareable)Teradata Aster
 

Mehr von Teradata Aster (20)

Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingRazorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-Making
 
Using Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic EnvironmentUsing Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic Environment
 
Big Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey DataBig Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey Data
 
What Makes A Great Data Scientist?
What Makes A Great Data Scientist?What Makes A Great Data Scientist?
What Makes A Great Data Scientist?
 
Practical Applications of Visual Analytics
Practical Applications of Visual AnalyticsPractical Applications of Visual Analytics
Practical Applications of Visual Analytics
 
Trust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social MediaTrust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social Media
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business Advantage
 
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s DilemmaBig Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 
Keynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball AnalyticsKeynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball Analytics
 
Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics, Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics,
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
From Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedFrom Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics Applied
 
Solving the Education Crisis with Big Data
Solving the Education Crisis with Big DataSolving the Education Crisis with Big Data
Solving the Education Crisis with Big Data
 
SAS aster data big data dc presentation public
SAS aster data big data dc presentation publicSAS aster data big data dc presentation public
SAS aster data big data dc presentation public
 
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
 
comScore
comScorecomScore
comScore
 
20100506 aster data big data summit - microstrategy (shareable)
20100506   aster data big data summit - microstrategy (shareable)20100506   aster data big data summit - microstrategy (shareable)
20100506 aster data big data summit - microstrategy (shareable)
 

Kürzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

SQL-MapReduce for Advanced Analytics and Simplified Architecture

  • 1. Using SQL-MapReduce for Advanced Analytical Queries by Rick F. van der LansR20/Consultancy BV
  • 2. What Did the Users Want? BI reports Production databases
  • 3. But What Did We Create? ODS data warehouse datamart production database cube
  • 4. Problems with Current DW Platforms 45% 40% 39% 37% 33% 29% 23% 23% 21% 20% 19% 16% 16% 15% 14% 13% 11% 4% 3% Poor query response Can’t support advanced analytics Inadequate data load speed Can’t scale to large data volumes Cost of scaling up is too expensive Poorly suited to real-time or on demand workloads Current platform is a legacy we must phase out Can’t support data modeling we need We need platform that supports mixed workloads Can’t support large concurrent user count Inadequate high availability Inadequate support for in-memory processing Inadequate support for web services and SOA Current platform is 32-bit, and we need 64-bit Current platform is SMP, and we need MPP We need platform better suited to cloud or virtualization Can’t secure the data properly Other No problems Source: P. Russom, ‘Next Generation Data Warehouse Platforms’, TDWI Best Practices Report, fourth quarter 2009.
  • 5. 49% 8% 20% 12% 8% 1% 1% 3% current DW platform 2009 2010 2011 2012 2013 2014 2015 or later Need for More Powerful Data Warehouse Platforms no plans to replace Source: P. Russom, ‘Next Generation Data Warehouse Platforms’, TDWI Best Practices Report, fourth quarter 2009.
  • 6. New Forms of Analytics Advanced Analytics Operational Analytics Deep Analytics Self-Service Analytics Complex Analytics Automated Analytics
  • 7. Positioning of Advanced Analytics complexity of analytical queries high complex queries on small to medium size databases advanced analytics simple queries on small to medium size databases simple queries on large to ultra large databases low database size low high
  • 8. Parallellization of SQL Worker Worker Worker SELECT * FROM CUSTOMERS WHERE LOCATION = 'New York' Database Server Master
  • 9. How Easy Is Parallelizing SQL Queries? (1) Example 1: SELECT ID, SALES_DATE, PRICE FROM SALES_RECORDS WHERE PRICE > 100 Example 2: SELECT REGION_ID, SUM(PRICE) FROM SALES_RECORDS WHERE PRICE > 100 GROUP BY REGION_ID
  • 10. How Easy Is Parallelizing SQL Queries? (2) Example 3: Get all the flights to London for which another flight exists to London that leaves within an hour on the same day. SELECT * FROM DEPARTURES AS D1 WHERE DESTINATION = 'London' AND DEPARTURE_TIME + 60 MINUTES >= (SELECT MIN(DEPARTURE_TIME) FROM DEPARTURES AS D2 WHERE DESTINATION = 'London' AND D2.DEPARTURE_TIME > D1.DEPARTURE_TIME AND D2.DEPARTURE_DAY = D1.DEPARTURE_DAY) ORDER BY DEPARTURE_TIME
  • 11. How Easy Is Parallelizing SQL Queries? (3) SELECTA.PROD_DESC AS ITEM1,B.PROD_DESC AS ITEM2,C.PROD_DESC AS ITEM3,COUNT (*) AS CNTFROM(SELECT SF.STORE_ID, SF.REG_ID, SF.TRAN_NO, SF.ITEM_ID, SF.DT, PD.PROD_DESC, PD.PRICE       FROM             SALES_FACT SF       INNER JOIN             PRODUCT_DIM PD       WHERE             SF.ITEM_ID=PD.ITEM_ID) AS TRANSACTIONS A, (SELECT SF.STORE_ID, SF.REG_ID, SF.TRAN_NO, SF.ITEM_ID, SF.DT, PD.PROD_DESC, PD.PRICE       FROM             SALES_FACT SF       INNER JOIN             PRODUCT_DIM PD       WHERE             SF.ITEM_ID=PD.ITEM_ID) AS TRANSACTIONS B,(SELECT SF.STORE_ID, SF.REG_ID, SF.TRAN_NO, SF.ITEM_ID, SF.DT, PD.PROD_DESC, PD.PRICE       FROM             SALES_FACT SF       ,,,            PRODUCT_DIM PD       WHERE             SF.ITEM_ID=PD.ITEM_ID) AS TRANSACTIONS C WHERE A.STORE_ID=B.STORE_ID AND  B.STORE_ID=C.STORE_ID AND A.STORE_ID=C.STORE_ID AND A.REG_ID=B.REG_ID AND  B.REG_ID=C.REG_ID AND A.REG_ID=C.REG_ID AND A.TRAN_NO=B.TRAN_NO AND  B.TRAN_NO=C.TRAN_NO AND A.TRAN_NO=C.TRAN_NO AND A.DT=B.DT AND  B.DT=C.DT AND A.DT=C.DT AND A.ITEM_ID<>B.ITEM_ID AND A.ITEM_ID<>C.ITEM_ID AND B.ITEM_ID<>C.ITEM_IDGROUP BY A.PROD_DESC,  B.PROD_DESC,  C.PROD_DESCHAVING  COUNT(*)>1000ORDER BY COUNT(*) DESC; Example 4: Market basket analysis:
  • 12. Declarativeness and Storage Independency Declarativeness: The developer has only to program what has to be done, and not how it should be done. Storage independency: The language should hide how data is physically stored and how it is accessed.
  • 13. Advantages of Two Properties Productivity increase less code has to be written Maintainability: less code means having to maintain less code Flexibility: changes to the storage layer can be made without the need to change the SQL code in the reports
  • 14. Different Types of SQL Functions Built-in or User-defined SELECT FLIGHT, TRUNCATE(DEPARTURE_TIME, MINUTES) FROM DEPARTURES AS D1 WHERE BANK_HOLIDAY(DEPARTURE_TIME) = 1 Scalar or Table SELECT AVG(DURATION) FROM LAST_FIVE_ROWS(DEPARTURES) Pure SQL, Procedural, or External Simple or Complex
  • 15. MapReduce MapReduce is a programming model introduced by Google Aimed at processing requests on large data sets where the processing can be distributed over a high number of nodes using parallel capabilities Two steps Map and Reduce Map is like Select Reduce is like Group-by
  • 16. Aster Data’s SQL-MapReduce (1) SQL-MR is a set of built-in and user-defined external table functions Example: SELECT * FROM GET_NEXT_FLIGHT_1HR (ON DEPARTURES PARTITION BY DESTINATION) WHERE DESTINATION = 'London' ORDER BY DEPARTURE_TIME All the SQL-MR function processing is parallelized Including complex group-by operations and time-series analytics
  • 17. Aster Data’s SQL-MapReduce (2) An SQL-MR function can contain the most complex analytical logic Programmers of SQL don’t need to learn a new language, Java, C++, Python, and many more can be used The SQL statements invoking SQL-MR functions are still declarative and storage-independent The functions themselves are not Usable by any BI tools supporting SQL
  • 19.
  • 21. Efficiency of low-level programming language
  • 27. Nesting of the functions
  • 28. Small group of developers have to learn a new language (possibly)
  • 29. Low-level language is not declarative
  • 30.
  • 31. Business Advantages of SQL-MR Simplification of architecture Deep analytics Complex analytics Operational analytics Self-service analytics No forbidden queries
  • 32. Simplification of Architecture SQL-MR production database data warehouse ODS datamart cube analytics
  • 33. Conclusions The analytical and reporting demands are increasing Most environments already have problems with performance The marriage of SQL and MapReduce offers an enormous potential Parallelizing the processing of analytical logic
  • 34. Business Advantages of SQL-MR Simplification of architecture Deep analytics Complex analytics Operational analytics Self-service analytics No forbidden queries
  • 35. Questions & Answers Rick van der Lans R20 Consultancy e-mail: rick@r20.nl website: http://www.r20.nl Stephanie McReynolds Director of Product Marketing, Aster Data e-mail: smcreyno@asterdata.com For More Information on Aster Data: http: //www.asterdata.com