SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
In-Database
Predictive Analytics
        John A. De Goes
   @jdegoes, john@precog.com
Agenda




  •   Introduction
  •   Abusing SQL
  •   Painful by Design
  •   Database Extensions
  •   MADlib
  •   Other Approaches
  •   Summary
Introduction




    In-Database Predictive Analytics

    In-database predictive analytics refers
    to the the process of performing
    advanced predictive analytics directly
    inside the database.
Introduction


      Traditional Predictive Analytics

        R

                            database

       SAS
Introduction




        R




                                  database
       SAS




               Data Bottleneck:
                Painful, Slow
Introduction




               What’s the answer?
Introduction
        Move the Code, not the Data!




                   Advanced
                   Analytics




                “MapReduce”
Abusing SQL




         Let’s Do K-Means in SQL!
Abusing SQL
       General Approach in RDBMS



                  SQL

       Driver              Database
                Feedback
Abusing SQL
                       Our Initial Model



                                         model
                   d             k             n           iteration       avg_q




          number of dimensions          number of points                   variance




                         number of clusters         number of iterations
Abusing SQL
              Our Initial Data Set

                          Y
         Y1        Y2            Y3   Y3




                        n rows
Abusing SQL
             Projection & Numbering

                   Y                                     YH
      Y1      Y2       Y3    ...                i     Y1       ...    Yd
       1                                       1
       2                                       2
       3                                       3
       4                                       4
       ...                                     ...
       ...                                     ...
       n                                       n

       INSERT INTO YH
       SELECT sum(1) over(rows unbounded preceding) AS i,Y1, Y2, ..., Yd
       FROM Y;
Abusing SQL
                           Flattening

               YH                                       YV
         i    Y1     ...     Yd                i          l       val
        1                                      1         1
        2                                      1         2
        3                                      1
                                               ...       ...
        4                                      1         d
        ...                                    2         1
        ...                                    ...       ...
        n                                      n         d
                                                     n x d rows

                   INSERT INTO YV SELECT i,1,Y1 FROM YH;
                   ...
                   INSERT INTO YV SELECT i,d,Yd FROM YH;
Abusing SQL
       Initializing k Cluster Centers

               YH                                          CH
         i    Y1    ...      Yd                  j      Y1      ...   Yd
        1                                        1
        2                                        2
        3                                        3
        4                                        4
        ...                                      ...
        ...                                      ...
        n                                        k

                    INSERT   INTO CH
                    SELECT   1,Y1, ..., Yd FROM YH SAMPLE 1;
                    ...
                    INSERT   INTO CH
                    SELECT   k,Y1, ..., Yd FROM YH SAMPLE 1;
Abusing SQL
                          Flattening

               CH                                       C
        j     Y1    ...      Yd                 l        j        val
        1                                       1        1
        2                                       1        2
        3                                      ...       ...
        4                                       1        k
        ...                                     2        1
        ...                                    ...       ...
        k                                       d        k
                                                     d x k rows
                    INSERT   INTO C
                    SELECT   1, 1, Y1 FROM CH WHERE j = 1;
                    ...
                    INSERT   INTO C
                    SELECT   d, k, Yd FROM CH WHERE j = k;
Abusing SQL
     Computing Distances to Clusters

            YD
    i         j        dist
    1         1
    1         2
                              INSERT INTO YD
    ...       ...             SELECT i, j, sum((YV.val - C.val)**2)
    1         k               FROM YV, C WHERE YV.l = C.l
                                GROUP BY i, j;
    2         1
    ...       ...
    n         k
          n x k rows
Abusing SQL
          Computing Nearest Neighbors

     YNN
                 nearest clusters
     i       j
    1
    2            INSERT INTO YNN
                 SELECT YD.i,Y D.j
    3
                 FROM YD,
    4              (SELECT i, min(dist) AS mindist FROM YD
                   GROUP BY i) YMIND
    5
                 WHERE Y D.i = YMIND.i
    ...            and Y D.distance = YMIND.mindist;
    n
    n rows
Abusing SQL
         Count Points Per Cluster



    INSERT INTO W SELECT j, count(*)
    FROM YNN GROUP BY j;
    UPDATE W SET w = w/model.n;
Abusing SQL
         Compute New Centroids



    INSERT INTO C
    SELECT l, j, avg(YV.val) FROM YV, YNN
    WHERE YV.i = YNN.i GROUP BY l, j;
Abusing SQL
              Compute Variances

    INSERT INTO R
    SELECT C.l, C.j, avg((YV.val-
    C.val)**2)
    FROM C, YV, YNN
    WHERE YV.i = YNN.i
      and YV.l = C.l and YNN.j = C.j
    GROUP BY C.l, C.j;
Abusing SQL
               Update Model

    INSERT INTO R
    SELECT C.l, C.j, avg((YV.val-
    C.val)**2)
    FROM C, YV, YNN
    WHERE YV.i = YNN.i
      and YV.l = C.l and YNN.j = C.j
    GROUP BY C.l, C.j;
Abusing SQL




          Let’s not do that again!
Painful by Design




      Why are predictive analytics so
         hard to express in SQL?
Painful by Design
                    #1: No Arrays




   Sets             Tuples          Arrays
     rows             columns
Painful by Design
         #2: Relational Algebra Sucks

        Projection            Selection               Rename                 Natural Join
                                                                                    R            S




          Semijoin           Antijoin                 Division                 Theta Join
           R        S         R       S               R   ÷   S



        Left outer join   Right outer join      Full outer join              Aggregation
            R   ⟕   S         R   ⟖   S               R⟗ S        G1, G2, ..., Gm g f1(A1'), f2(A2'), ..., fk(Ak') (r)




      Iteration                           Recursion                      Multiple Dimensions
Database Extensions




      There’s GOT to be a better way!
Database Extensions




                      C Extension
Database Extensions




              UDF                    UDA
      User-Defined Function   User-Defined Aggregate




            Map                  Reduce
            map(a)                init(a)
           op2(a,b)             accum(a, b)
                                merge(a, b)
                                 final(a)
MADlib




   MADlib is an open-source library for
   scalable in-database analytics.
   It is implemented using database
   extensions written in C, and is available
   for PostgreSQL and Greenplum.
MADlib
          1. Download the binary


  Mac OS X
  http://www.madlib.net/files/madlib-0.6-
  Darwin.dmg


  Linux
  http://www.madlib.net/files/madlib-0.6-
  Linux.rpm
MADlib
             2. Start the Installation



  Mac OS X
  Double-click on installer


  Linux
  yum install $MADLIB_PACKAGE --nogpgcheck
MADlib
              3. Verify Locatability


  Greenplum
  source /path/to/greenplum/
  greenplum_path.sh


  PostgreSQL
  Make sure psql is in PATH
MADlib
               4. Register MADlib


  Greenplum
  /usr/local/madlib/bin/madpack -p greenplum
  -c $USER@$HOST/$DATABASE install


  PostgreSQL
  /usr/local/madlib/bin/madpack -p postgres
  -c $USER@$HOST/$DATABASE install
MADlib
               5. Test Installation


  Greenplum
  /usr/local/madlib/bin/madpack -p greenplum -c
  $USER@$HOST/$DATABASE install-check


  PostgreSQL
  /usr/local/madlib/bin/madpack -p postgres
  -c $USER@$HOST/$DATABASE install-check
MADlib
           Clustering in MADlib



  SELECT * FROM kmeans_random(
     'rel_source', 'expr_point', k,
     [ 'fn_dist', 'agg_centroid',
     max_num_iterations,
  min_frac_reassigned ]
  );
MADlib




         Ahhhhhh......
MADlib
         Our Way or the Highway




                Composability
Other Approaches




              RDBMS Isn’t the
            Only Game in Town!
Other Approaches
                    1. Embrace Coding


  • Hadoop Ecosystem
   • Mahout, Cascading/Scalding, Crunch/Scrunch, Pangool, Cascalog, and,
     of course, MapReduce
  • BDAS Ecosystem
   • Spark
Other Approaches
                        2. Reject RDBMS



  • Datalog + variants
   • In theory, ideal for many kinds of predictive analytics
   • Suffers from a lack of distributed, feature-complete implementations
Other Approaches
                         2. Reject RDBMS


  • Rasdaman / RASQL
    • Arrays but not analytics


  Community Editions
  http://www.rasdaman.org
Other Approaches
                           2. Reject RDBMS


  • MonetDB / SciQL
    • Array extension of SQL
    • Poor analytics


  Community Editions
  http://www.monetdb.org
Other Approaches
                        2. Reject RDBMS


  • SciDB / AFL (AQL)
    • Excellent analytics
    • Limited composability


  Community Editions
  http://www.scidb.org/forum/viewtopic.php?f=16&t=364/
Other Approaches
                         2. Reject RDBMS


  • Precog / Quirrel (simple “R for big data”)
    • Multidimensional, arrays + functions
    • Still immature


  Community Editions
  http://www.precog.com/editions/precog-for-mongodb (MongoDB)
  http://www.precog.com/editions/precog-for-postgresql (PostgreSQL)
Summary


  • Increase performance, reduce friction by doing more inside
    the database

  • Not a panacea
   • Hard to do in SQL
   • Hard to do in C (but you may not have to: MADlib)
   • Pre-canned & brittle in most databases


  • Ultimately what’s needed is tech designed for advanced
    analytics
Q&A
     John A. De Goes
@jdegoes, john@precog.com
References




  • Programming the K-means Clustering Algorithm in SQL
    (Teradata, NCR)

Weitere ähnliche Inhalte

Was ist angesagt?

Mth 4108-1 b (ans)
Mth 4108-1 b (ans)Mth 4108-1 b (ans)
Mth 4108-1 b (ans)outdoorjohn
 
Scatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionScatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionAnkit Katiyar
 
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...ELMIR IVAN OZUNA LOPEZ
 
solucionario de purcell 3
solucionario de purcell 3solucionario de purcell 3
solucionario de purcell 3José Encalada
 
A Case Study of Expressively Constrainable Level Design Automation Tools for ...
A Case Study of Expressively Constrainable Level Design Automation Tools for ...A Case Study of Expressively Constrainable Level Design Automation Tools for ...
A Case Study of Expressively Constrainable Level Design Automation Tools for ...rndmcnlly
 
X2 t08 03 inequalities & graphs (2012)
X2 t08 03 inequalities & graphs (2012)X2 t08 03 inequalities & graphs (2012)
X2 t08 03 inequalities & graphs (2012)Nigel Simmons
 
X2 T08 01 inequalities and graphs (2010)
X2 T08 01 inequalities and graphs (2010)X2 T08 01 inequalities and graphs (2010)
X2 T08 01 inequalities and graphs (2010)Nigel Simmons
 
Ee107 sp 06_mock_test1_q_s_ok_3p_
Ee107 sp 06_mock_test1_q_s_ok_3p_Ee107 sp 06_mock_test1_q_s_ok_3p_
Ee107 sp 06_mock_test1_q_s_ok_3p_Sporsho
 
2010 mathematics hsc solutions
2010 mathematics hsc solutions2010 mathematics hsc solutions
2010 mathematics hsc solutionsjharnwell
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031frdos
 
Modul penggunaan kalkulator sainstifik sebagai ABM dalam Matematik
Modul penggunaan kalkulator sainstifik sebagai ABM dalam MatematikModul penggunaan kalkulator sainstifik sebagai ABM dalam Matematik
Modul penggunaan kalkulator sainstifik sebagai ABM dalam MatematikNorsyazana Kamarudin
 
Hmm Tutorial
Hmm TutorialHmm Tutorial
Hmm Tutorialjefftang
 

Was ist angesagt? (17)

Mth 4108-1 b (ans)
Mth 4108-1 b (ans)Mth 4108-1 b (ans)
Mth 4108-1 b (ans)
 
Lesson 1: Functions
Lesson 1: FunctionsLesson 1: Functions
Lesson 1: Functions
 
C1 january 2012_mark_scheme
C1 january 2012_mark_schemeC1 january 2012_mark_scheme
C1 january 2012_mark_scheme
 
Scatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionScatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssion
 
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
Calculo y geometria analitica (larson hostetler-edwards) 8th ed - solutions m...
 
solucionario de purcell 3
solucionario de purcell 3solucionario de purcell 3
solucionario de purcell 3
 
A Case Study of Expressively Constrainable Level Design Automation Tools for ...
A Case Study of Expressively Constrainable Level Design Automation Tools for ...A Case Study of Expressively Constrainable Level Design Automation Tools for ...
A Case Study of Expressively Constrainable Level Design Automation Tools for ...
 
09 trial jpwp_s2
09 trial jpwp_s209 trial jpwp_s2
09 trial jpwp_s2
 
Day 12 slope stations
Day 12 slope stationsDay 12 slope stations
Day 12 slope stations
 
X2 t08 03 inequalities & graphs (2012)
X2 t08 03 inequalities & graphs (2012)X2 t08 03 inequalities & graphs (2012)
X2 t08 03 inequalities & graphs (2012)
 
X2 T08 01 inequalities and graphs (2010)
X2 T08 01 inequalities and graphs (2010)X2 T08 01 inequalities and graphs (2010)
X2 T08 01 inequalities and graphs (2010)
 
01 analysis-of-algorithms
01 analysis-of-algorithms01 analysis-of-algorithms
01 analysis-of-algorithms
 
Ee107 sp 06_mock_test1_q_s_ok_3p_
Ee107 sp 06_mock_test1_q_s_ok_3p_Ee107 sp 06_mock_test1_q_s_ok_3p_
Ee107 sp 06_mock_test1_q_s_ok_3p_
 
2010 mathematics hsc solutions
2010 mathematics hsc solutions2010 mathematics hsc solutions
2010 mathematics hsc solutions
 
Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031Ann chapter-3-single layerperceptron20021031
Ann chapter-3-single layerperceptron20021031
 
Modul penggunaan kalkulator sainstifik sebagai ABM dalam Matematik
Modul penggunaan kalkulator sainstifik sebagai ABM dalam MatematikModul penggunaan kalkulator sainstifik sebagai ABM dalam Matematik
Modul penggunaan kalkulator sainstifik sebagai ABM dalam Matematik
 
Hmm Tutorial
Hmm TutorialHmm Tutorial
Hmm Tutorial
 

Andere mochten auch

Post-Free: Life After Free Monads
Post-Free: Life After Free MonadsPost-Free: Life After Free Monads
Post-Free: Life After Free MonadsJohn De Goes
 
Analytics Maturity Model
Analytics Maturity ModelAnalytics Maturity Model
Analytics Maturity ModelJohn De Goes
 
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセットアウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセットMasanori Saito
 
Predictive analytics-nirmal.potx
Predictive analytics-nirmal.potxPredictive analytics-nirmal.potx
Predictive analytics-nirmal.potxWSO2
 
Quirrel & R for Dummies
Quirrel & R for DummiesQuirrel & R for Dummies
Quirrel & R for DummiesJohn De Goes
 
201406 IASA: Analytics Maturity - Unlocking The Business Impact
201406 IASA: Analytics Maturity - Unlocking The Business Impact201406 IASA: Analytics Maturity - Unlocking The Business Impact
201406 IASA: Analytics Maturity - Unlocking The Business ImpactSteven Callahan
 
Make Better Decisions With Your Data 20080916
Make Better Decisions With Your Data 20080916Make Better Decisions With Your Data 20080916
Make Better Decisions With Your Data 20080916Dan English
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific databaseJohn De Goes
 
Competing on analytics
Competing on analyticsCompeting on analytics
Competing on analyticsGreg Seltzer
 
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineerAtsushi Neki
 
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSBIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSTIBCO Spotfire
 
The CDO Agenda: Competing with Data - Strategy and Organization
The CDO Agenda: Competing with Data - Strategy and OrganizationThe CDO Agenda: Competing with Data - Strategy and Organization
The CDO Agenda: Competing with Data - Strategy and OrganizationDATAVERSITY
 
AIA SOX Conference May 2009 - CCM & Data Analytics
AIA SOX Conference May 2009 - CCM & Data AnalyticsAIA SOX Conference May 2009 - CCM & Data Analytics
AIA SOX Conference May 2009 - CCM & Data Analyticsprosenzw69
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Sarah Aerni
 
Analytics Environment
Analytics EnvironmentAnalytics Environment
Analytics EnvironmentYuu Kimy
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseMongoDB
 
About alteryx
About alteryxAbout alteryx
About alteryxYuu Kimy
 
Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119Keiichiro Nabeno
 
BI Maturity Model ppt
BI Maturity Model pptBI Maturity Model ppt
BI Maturity Model pptYiwei Chen
 

Andere mochten auch (20)

Post-Free: Life After Free Monads
Post-Free: Life After Free MonadsPost-Free: Life After Free Monads
Post-Free: Life After Free Monads
 
Analytics Maturity Model
Analytics Maturity ModelAnalytics Maturity Model
Analytics Maturity Model
 
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセットアウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
アウトプットし続ける技術〜毎日書くためのマインドセットとスキルセット
 
Predictive analytics-nirmal.potx
Predictive analytics-nirmal.potxPredictive analytics-nirmal.potx
Predictive analytics-nirmal.potx
 
Quirrel & R for Dummies
Quirrel & R for DummiesQuirrel & R for Dummies
Quirrel & R for Dummies
 
201406 IASA: Analytics Maturity - Unlocking The Business Impact
201406 IASA: Analytics Maturity - Unlocking The Business Impact201406 IASA: Analytics Maturity - Unlocking The Business Impact
201406 IASA: Analytics Maturity - Unlocking The Business Impact
 
Make Better Decisions With Your Data 20080916
Make Better Decisions With Your Data 20080916Make Better Decisions With Your Data 20080916
Make Better Decisions With Your Data 20080916
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific database
 
Competing on analytics
Competing on analyticsCompeting on analytics
Competing on analytics
 
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer
20161209 JAWS-UG AI支部 #2 LT : Moving story of AWS/ML beginner engineer
 
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSBIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
 
Competing on analytics
Competing on analyticsCompeting on analytics
Competing on analytics
 
The CDO Agenda: Competing with Data - Strategy and Organization
The CDO Agenda: Competing with Data - Strategy and OrganizationThe CDO Agenda: Competing with Data - Strategy and Organization
The CDO Agenda: Competing with Data - Strategy and Organization
 
AIA SOX Conference May 2009 - CCM & Data Analytics
AIA SOX Conference May 2009 - CCM & Data AnalyticsAIA SOX Conference May 2009 - CCM & Data Analytics
AIA SOX Conference May 2009 - CCM & Data Analytics
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
 
Analytics Environment
Analytics EnvironmentAnalytics Environment
Analytics Environment
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
About alteryx
About alteryxAbout alteryx
About alteryx
 
Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119Io tビジネスモデルに関する考察20161119
Io tビジネスモデルに関する考察20161119
 
BI Maturity Model ppt
BI Maturity Model pptBI Maturity Model ppt
BI Maturity Model ppt
 

Ähnlich wie In-Database Predictive Analytics

RNN sharing at Trend Micro
RNN sharing at Trend MicroRNN sharing at Trend Micro
RNN sharing at Trend MicroChun Hao Wang
 
Bouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxBouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxYuji Oyamada
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Ode powerpoint presentation1
Ode powerpoint presentation1Ode powerpoint presentation1
Ode powerpoint presentation1Pokkarn Narkhede
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntuaIEEE NTUA SB
 
Regularized Estimation of Spatial Patterns
Regularized Estimation of Spatial PatternsRegularized Estimation of Spatial Patterns
Regularized Estimation of Spatial PatternsWen-Ting Wang
 
Special Techniques (Teknik Khusus)
Special Techniques (Teknik Khusus)Special Techniques (Teknik Khusus)
Special Techniques (Teknik Khusus)Septiko Aji
 
Algorithm chapter 8
Algorithm chapter 8Algorithm chapter 8
Algorithm chapter 8chidabdu
 
Geometric transformation cg
Geometric transformation cgGeometric transformation cg
Geometric transformation cgharinipriya1994
 
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’s
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’sKekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’s
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’sIAEME Publication
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Leonid Zhukov
 
23 industrial engineering
23 industrial engineering23 industrial engineering
23 industrial engineeringmloeb825
 
X2 T08 03 inequalities & graphs (2011)
X2 T08 03 inequalities & graphs (2011)X2 T08 03 inequalities & graphs (2011)
X2 T08 03 inequalities & graphs (2011)Nigel Simmons
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithmguest862df4e
 

Ähnlich wie In-Database Predictive Analytics (20)

Im2013vit
Im2013vitIm2013vit
Im2013vit
 
RNN sharing at Trend Micro
RNN sharing at Trend MicroRNN sharing at Trend Micro
RNN sharing at Trend Micro
 
Bouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration ToolboxBouguet's MatLab Camera Calibration Toolbox
Bouguet's MatLab Camera Calibration Toolbox
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Ode powerpoint presentation1
Ode powerpoint presentation1Ode powerpoint presentation1
Ode powerpoint presentation1
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntua
 
Regularized Estimation of Spatial Patterns
Regularized Estimation of Spatial PatternsRegularized Estimation of Spatial Patterns
Regularized Estimation of Spatial Patterns
 
Special Techniques (Teknik Khusus)
Special Techniques (Teknik Khusus)Special Techniques (Teknik Khusus)
Special Techniques (Teknik Khusus)
 
Models
ModelsModels
Models
 
Algorithm chapter 8
Algorithm chapter 8Algorithm chapter 8
Algorithm chapter 8
 
Geometric transformation cg
Geometric transformation cgGeometric transformation cg
Geometric transformation cg
 
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’s
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’sKekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’s
Kekre’s hybrid wavelet transform technique with dct, walsh, hartley and kekre’s
 
Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.Numerical Linear Algebra for Data and Link Analysis.
Numerical Linear Algebra for Data and Link Analysis.
 
23 industrial engineering
23 industrial engineering23 industrial engineering
23 industrial engineering
 
Pole Placement in Digital Control
Pole Placement in Digital ControlPole Placement in Digital Control
Pole Placement in Digital Control
 
X2 T08 03 inequalities & graphs (2011)
X2 T08 03 inequalities & graphs (2011)X2 T08 03 inequalities & graphs (2011)
X2 T08 03 inequalities & graphs (2011)
 
Neural network and mlp
Neural network and mlpNeural network and mlp
Neural network and mlp
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithm
 
Dijkstra
DijkstraDijkstra
Dijkstra
 
Dijkstra
DijkstraDijkstra
Dijkstra
 

Mehr von John De Goes

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type ClassesJohn De Goes
 
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them AllJohn De Goes
 
Error Management: Future vs ZIO
Error Management: Future vs ZIOError Management: Future vs ZIO
Error Management: Future vs ZIOJohn De Goes
 
Atomically { Delete Your Actors }
Atomically { Delete Your Actors }Atomically { Delete Your Actors }
Atomically { Delete Your Actors }John De Goes
 
The Death of Final Tagless
The Death of Final TaglessThe Death of Final Tagless
The Death of Final TaglessJohn De Goes
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
 
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingJohn De Goes
 
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018John De Goes
 
Scalaz 8: A Whole New Game
Scalaz 8: A Whole New GameScalaz 8: A Whole New Game
Scalaz 8: A Whole New GameJohn De Goes
 
Scalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsScalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsJohn De Goes
 
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional ArchitectureJohn De Goes
 
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemJohn De Goes
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsJohn De Goes
 
Streams for (Co)Free!
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!John De Goes
 
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...John De Goes
 
Halogen: Past, Present, and Future
Halogen: Past, Present, and FutureHalogen: Past, Present, and Future
Halogen: Past, Present, and FutureJohn De Goes
 
All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!John De Goes
 

Mehr von John De Goes (20)

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type Classes
 
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them All
 
Error Management: Future vs ZIO
Error Management: Future vs ZIOError Management: Future vs ZIO
Error Management: Future vs ZIO
 
Atomically { Delete Your Actors }
Atomically { Delete Your Actors }Atomically { Delete Your Actors }
Atomically { Delete Your Actors }
 
The Death of Final Tagless
The Death of Final TaglessThe Death of Final Tagless
The Death of Final Tagless
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
 
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
 
ZIO Queue
ZIO QueueZIO Queue
ZIO Queue
 
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
 
Scalaz 8: A Whole New Game
Scalaz 8: A Whole New GameScalaz 8: A Whole New Game
Scalaz 8: A Whole New Game
 
Scalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsScalaz 8 vs Akka Actors
Scalaz 8 vs Akka Actors
 
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional Architecture
 
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect System
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
 
Streams for (Co)Free!
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!
 
MTL Versus Free
MTL Versus FreeMTL Versus Free
MTL Versus Free
 
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
 
Halogen: Past, Present, and Future
Halogen: Past, Present, and FutureHalogen: Past, Present, and Future
Halogen: Past, Present, and Future
 
All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!
 

Kürzlich hochgeladen

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Kürzlich hochgeladen (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

In-Database Predictive Analytics

  • 1. In-Database Predictive Analytics John A. De Goes @jdegoes, john@precog.com
  • 2. Agenda • Introduction • Abusing SQL • Painful by Design • Database Extensions • MADlib • Other Approaches • Summary
  • 3. Introduction In-Database Predictive Analytics In-database predictive analytics refers to the the process of performing advanced predictive analytics directly inside the database.
  • 4. Introduction Traditional Predictive Analytics R database SAS
  • 5. Introduction R database SAS Data Bottleneck: Painful, Slow
  • 6. Introduction What’s the answer?
  • 7. Introduction Move the Code, not the Data! Advanced Analytics “MapReduce”
  • 8. Abusing SQL Let’s Do K-Means in SQL!
  • 9. Abusing SQL General Approach in RDBMS SQL Driver Database Feedback
  • 10. Abusing SQL Our Initial Model model d k n iteration avg_q number of dimensions number of points variance number of clusters number of iterations
  • 11. Abusing SQL Our Initial Data Set Y Y1 Y2 Y3 Y3 n rows
  • 12. Abusing SQL Projection & Numbering Y YH Y1 Y2 Y3 ... i Y1 ... Yd 1 1 2 2 3 3 4 4 ... ... ... ... n n INSERT INTO YH SELECT sum(1) over(rows unbounded preceding) AS i,Y1, Y2, ..., Yd FROM Y;
  • 13. Abusing SQL Flattening YH YV i Y1 ... Yd i l val 1 1 1 2 1 2 3 1 ... ... 4 1 d ... 2 1 ... ... ... n n d n x d rows INSERT INTO YV SELECT i,1,Y1 FROM YH; ... INSERT INTO YV SELECT i,d,Yd FROM YH;
  • 14. Abusing SQL Initializing k Cluster Centers YH CH i Y1 ... Yd j Y1 ... Yd 1 1 2 2 3 3 4 4 ... ... ... ... n k INSERT INTO CH SELECT 1,Y1, ..., Yd FROM YH SAMPLE 1; ... INSERT INTO CH SELECT k,Y1, ..., Yd FROM YH SAMPLE 1;
  • 15. Abusing SQL Flattening CH C j Y1 ... Yd l j val 1 1 1 2 1 2 3 ... ... 4 1 k ... 2 1 ... ... ... k d k d x k rows INSERT INTO C SELECT 1, 1, Y1 FROM CH WHERE j = 1; ... INSERT INTO C SELECT d, k, Yd FROM CH WHERE j = k;
  • 16. Abusing SQL Computing Distances to Clusters YD i j dist 1 1 1 2 INSERT INTO YD ... ... SELECT i, j, sum((YV.val - C.val)**2) 1 k FROM YV, C WHERE YV.l = C.l GROUP BY i, j; 2 1 ... ... n k n x k rows
  • 17. Abusing SQL Computing Nearest Neighbors YNN nearest clusters i j 1 2 INSERT INTO YNN SELECT YD.i,Y D.j 3 FROM YD, 4 (SELECT i, min(dist) AS mindist FROM YD GROUP BY i) YMIND 5 WHERE Y D.i = YMIND.i ... and Y D.distance = YMIND.mindist; n n rows
  • 18. Abusing SQL Count Points Per Cluster INSERT INTO W SELECT j, count(*) FROM YNN GROUP BY j; UPDATE W SET w = w/model.n;
  • 19. Abusing SQL Compute New Centroids INSERT INTO C SELECT l, j, avg(YV.val) FROM YV, YNN WHERE YV.i = YNN.i GROUP BY l, j;
  • 20. Abusing SQL Compute Variances INSERT INTO R SELECT C.l, C.j, avg((YV.val- C.val)**2) FROM C, YV, YNN WHERE YV.i = YNN.i and YV.l = C.l and YNN.j = C.j GROUP BY C.l, C.j;
  • 21. Abusing SQL Update Model INSERT INTO R SELECT C.l, C.j, avg((YV.val- C.val)**2) FROM C, YV, YNN WHERE YV.i = YNN.i and YV.l = C.l and YNN.j = C.j GROUP BY C.l, C.j;
  • 22. Abusing SQL Let’s not do that again!
  • 23. Painful by Design Why are predictive analytics so hard to express in SQL?
  • 24. Painful by Design #1: No Arrays Sets Tuples Arrays rows columns
  • 25. Painful by Design #2: Relational Algebra Sucks Projection Selection Rename Natural Join R S Semijoin Antijoin Division Theta Join R S R S R ÷ S Left outer join Right outer join Full outer join Aggregation R ⟕ S R ⟖ S R⟗ S G1, G2, ..., Gm g f1(A1'), f2(A2'), ..., fk(Ak') (r) Iteration Recursion Multiple Dimensions
  • 26. Database Extensions There’s GOT to be a better way!
  • 27. Database Extensions C Extension
  • 28. Database Extensions UDF UDA User-Defined Function User-Defined Aggregate Map Reduce map(a) init(a) op2(a,b) accum(a, b) merge(a, b) final(a)
  • 29. MADlib MADlib is an open-source library for scalable in-database analytics. It is implemented using database extensions written in C, and is available for PostgreSQL and Greenplum.
  • 30. MADlib 1. Download the binary Mac OS X http://www.madlib.net/files/madlib-0.6- Darwin.dmg Linux http://www.madlib.net/files/madlib-0.6- Linux.rpm
  • 31. MADlib 2. Start the Installation Mac OS X Double-click on installer Linux yum install $MADLIB_PACKAGE --nogpgcheck
  • 32. MADlib 3. Verify Locatability Greenplum source /path/to/greenplum/ greenplum_path.sh PostgreSQL Make sure psql is in PATH
  • 33. MADlib 4. Register MADlib Greenplum /usr/local/madlib/bin/madpack -p greenplum -c $USER@$HOST/$DATABASE install PostgreSQL /usr/local/madlib/bin/madpack -p postgres -c $USER@$HOST/$DATABASE install
  • 34. MADlib 5. Test Installation Greenplum /usr/local/madlib/bin/madpack -p greenplum -c $USER@$HOST/$DATABASE install-check PostgreSQL /usr/local/madlib/bin/madpack -p postgres -c $USER@$HOST/$DATABASE install-check
  • 35. MADlib Clustering in MADlib SELECT * FROM kmeans_random( 'rel_source', 'expr_point', k, [ 'fn_dist', 'agg_centroid', max_num_iterations, min_frac_reassigned ] );
  • 36. MADlib Ahhhhhh......
  • 37. MADlib Our Way or the Highway Composability
  • 38. Other Approaches RDBMS Isn’t the Only Game in Town!
  • 39. Other Approaches 1. Embrace Coding • Hadoop Ecosystem • Mahout, Cascading/Scalding, Crunch/Scrunch, Pangool, Cascalog, and, of course, MapReduce • BDAS Ecosystem • Spark
  • 40. Other Approaches 2. Reject RDBMS • Datalog + variants • In theory, ideal for many kinds of predictive analytics • Suffers from a lack of distributed, feature-complete implementations
  • 41. Other Approaches 2. Reject RDBMS • Rasdaman / RASQL • Arrays but not analytics Community Editions http://www.rasdaman.org
  • 42. Other Approaches 2. Reject RDBMS • MonetDB / SciQL • Array extension of SQL • Poor analytics Community Editions http://www.monetdb.org
  • 43. Other Approaches 2. Reject RDBMS • SciDB / AFL (AQL) • Excellent analytics • Limited composability Community Editions http://www.scidb.org/forum/viewtopic.php?f=16&t=364/
  • 44. Other Approaches 2. Reject RDBMS • Precog / Quirrel (simple “R for big data”) • Multidimensional, arrays + functions • Still immature Community Editions http://www.precog.com/editions/precog-for-mongodb (MongoDB) http://www.precog.com/editions/precog-for-postgresql (PostgreSQL)
  • 45. Summary • Increase performance, reduce friction by doing more inside the database • Not a panacea • Hard to do in SQL • Hard to do in C (but you may not have to: MADlib) • Pre-canned & brittle in most databases • Ultimately what’s needed is tech designed for advanced analytics
  • 46. Q&A John A. De Goes @jdegoes, john@precog.com
  • 47. References • Programming the K-means Clustering Algorithm in SQL (Teradata, NCR)