SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
Copyright © 2009 Beat Vontobel
This work is made available under the Creative Commons Attribution-Noncommercial-Share Alike license, see
http://creativecommons.org/licenses/by-nc-sa/3.0/




                                                                       Solving Common
                                                                       SQL Problems
                                                                       with the
                                                                       SeqEngine
                                                                       Beat Vontobel, CTO, MeteoNews AG
                                                                       b.vontobel@meteonews.ch

                                                                       http://seqengine.org
Solving SQL Problems with the SeqEngine
    How to benefit from simple auxiliary tables holding
•
    sequences
    Use of a pluggable storage engine to create such tables
•

    On the side:
•

        Some interesting benchmarks
    ‣

        MySQL-Optimizer caveats
    ‣

        Remember once more how to do things the „SQL-way“
    ‣




                                                          Beat Vontobel
                                                          CTO, MeteoNews AG
                                                          b.vontobel@meteonews.ch
                                                          http://seqengine.org
Sequences: What are we talking about?
CREATE TABLE integer_sequence (
    i INT NOT NULL PRIMARY KEY
);

INSERT INTO integer_sequence (i)
VALUES (1), (2), (3), (4), (5), (6), (7), (8);

SELECT * FROM integer_sequence;
+---+
|i|
+---+
|1|
|2|
|3|
|4|
  …

                                                 Beat Vontobel
                                                 CTO, MeteoNews AG
                                                 b.vontobel@meteonews.ch
                                                 http://seqengine.org
Names used by others…
    Pivot Table
•

         Can be used to „pivot“ other tables („turn them around“)
     ‣

    Integers Table
•

         They often hold integers as data type
     ‣

    Auxiliary/Utility Table
•

         They help us solve problems, but contain no actual data
     ‣

    Sequence Table
•

         Just what it is: The name I‘ll use
     ‣



                                                            Beat Vontobel
                                                            CTO, MeteoNews AG
                                                            b.vontobel@meteonews.ch
                                                            http://seqengine.org
What we‘re not talking about (1)
--   Oracle Style Sequences
--
--   (mostly used to generate primary keys, much
--   like what MySQL‘s auto_increment feature is
--   used for)

CREATE SEQUENCE customers_seq
 START WITH     1000
 INCREMENT BY   1;

INSERT INTO customers (customer_id, name)
VALUES (customers_seq.NEXTVAL, 'John Doe');




                                                   Beat Vontobel
                                                   CTO, MeteoNews AG
                                                   b.vontobel@meteonews.ch
                                                   http://seqengine.org
What we‘re not talking about (2)
    Sequence in Mathematics:
•

        „an ordered list of objects“
    ‣

        n-tuple
    ‣

    Sequence Table in SQL:
•

        a set, unordered by definition
    ‣
         F = {n | 1 ≤ n ≤ 20; n is integer}
        relation (set of 1-tuples)
    ‣




                                              Beat Vontobel
                                              CTO, MeteoNews AG
                                              b.vontobel@meteonews.ch
                                              http://seqengine.org
„Using such a utility table is a favorite old trick of experienced
                        SQL developers“

              (Stéphane Faroult: The Art of SQL)




                                                             Beat Vontobel
                                                             CTO, MeteoNews AG
                                                             b.vontobel@meteonews.ch
                                                             http://seqengine.org
Finding Holes… typically Swiss!




                                  Beat Vontobel
                                  CTO, MeteoNews AG
                                  b.vontobel@meteonews.ch
                                  http://seqengine.org
Finding Holes… in a Table!
+---------+---------------------+------+---
| stat_id | datetime            | tt   |…
+---------+---------------------+------+------+
|…          …                   |    …   …
          |                            |
| ABO     | 2004-11-03 22:40:00 | 8.3 | …
| ABO     | 2004-11-03 22:50:00 | 8.7 |
| ABO     | 2004-11-03 23:00:00 | 9.9 |
| ABO     | 2004-11-03 23:10:00 | 7.8 |
| ABO     | 2004-11-04 00:10:00 | 9.2 |
| ABO     | 2004-11-04 00:20:00 | 9.1 |
| ABO     | 2004-11-04 00:30:00 | 10.2 |
| ABO     | 2004-11-04 00:40:00 | 9.3 |
|                               |
  …       |…                         …   …
|         |                     |      |
+---------+---------------------+------+----
                                                  Beat Vontobel
                                                  CTO, MeteoNews AG
                                                  b.vontobel@meteonews.ch
                                                  http://seqengine.org
Finding Holes… in a Table!
+---------+---------------------+------+---
| stat_id | datetime            | tt   |…
+---------+---------------------+------+------+
|…          …                   |    …   …
          |                            |
| ABO     | 2004-11-03 22:40:00 | 8.3 | …
| ABO     | 2004-11-03 22:50:00 | 8.7 |
| ABO     | 2004-11-03 23:00:00 | 9.9 |
| ABO     | 2004-11-03 23:10:00 | 7.8 |
| ABO     | 2004-11-04 00:10:00 | 9.2 |
| ABO     | 2004-11-04 00:20:00 | 9.1 |
| ABO     | 2004-11-04 00:30:00 | 10.2 |
| ABO     | 2004-11-04 00:40:00 | 9.3 |
|                               |
  …       |…                         …   …
|         |                     |      |
+---------+---------------------+------+----
                                                  Beat Vontobel
                                                  CTO, MeteoNews AG
                                                  b.vontobel@meteonews.ch
                                                  http://seqengine.org
The table‘s create statement used for demo
CREATE TABLE temperatures (

  stat_id CHAR(3) NOT NULL,
  datetime TIMESTAMP NOT NULL,
  tt decimal(3,1) DEFAULT NULL,

  PRIMARY KEY (stat_id, datetime),
  UNIQUE KEY reverse_primary (datetime, stat_id)

);




                                                   Beat Vontobel
                                                   CTO, MeteoNews AG
                                                   b.vontobel@meteonews.ch
                                                   http://seqengine.org
How to „SELECT“ a row that doesn‘t exist?
    SELECT only returns rows that are there
•

    WHERE only filters rows
•

    We need something to generate rows!
•




                                              Beat Vontobel
                                              CTO, MeteoNews AG
                                              b.vontobel@meteonews.ch
                                              http://seqengine.org
Finding Holes… the naïve way
for(„all timestamps to check“) {

    /* Single SELECTs for every timestamp */
    db_query(„SELECT COUNT(*)
              FROM temperatures
              WHERE stat_id = ? AND datetime = ?“);

    if(„no row found“) {
       warn_about_missing_row(„timestamp“);
    }
}




                                                 Beat Vontobel
                                                 CTO, MeteoNews AG
                                                 b.vontobel@meteonews.ch
                                                 http://seqengine.org
Finding Holes… the „standard“ way
/* Working with an    ordered set */
db_query(„SELECT      datetime
          FROM        temperatures
          WHERE       stat_id = ?
          ORDER BY    datetime ASC“);

for(„all timestamps to check“) {

    db_fetch_row();

    while(„timestamps don‘t match“) {
       warn_about_missing_row();
       increment_timestamp();
    }
}


                                        Beat Vontobel
                                        CTO, MeteoNews AG
                                        b.vontobel@meteonews.ch
                                        http://seqengine.org
These were just ordinary JOINs!
    for-Loop just walks an „imaginary“ timestamps table with a
•
    sequence of all the values to check for!
    Then we LEFT JOIN these timestamps against our
•
    temperatures
        or do a NOT EXIST subquery
    ‣

    So, if we had a sequence table…
•




                                                         Beat Vontobel
                                                         CTO, MeteoNews AG
                                                         b.vontobel@meteonews.ch
                                                         http://seqengine.org
Sequence of timestamps
CREATE TABLE timestamps (
   datetime TIMESTAMP NOT NULL PRIMARY KEY
);

INSERT INTO timestamps (datetime)
VALUES ('2004-01-01 00:00:00'),
       ('2004-01-01 00:00:10'),
       …;

SELECT * FROM timestamps;
+---------------------+
| datetime            |
+---------------------+
| 2004-01-01 00:00:00 |
| 2004-01-01 00:00:10 |
|…                    |

                                             Beat Vontobel
                                             CTO, MeteoNews AG
                                             b.vontobel@meteonews.ch
                                             http://seqengine.org
Queries using the sequence
SELECT      *
FROM        timestamps                -- our „for-Loop“
LEFT JOIN   temperatures
ON          timestamps.datetime = temperatures.datetime
WHERE       temperatures.datetime IS NULL;

SELECT      *
FROM        timestamps                  -- our „for-Loop“
WHERE       NOT EXISTS   (
                SELECT   *
                FROM     temperatures
                WHERE    temperatures.datetime
                             = timestamps.datetime
            );



                                                     Beat Vontobel
                                                     CTO, MeteoNews AG
                                                     b.vontobel@meteonews.ch
                                                     http://seqengine.org
Finding missing rows
WHERE temperatures.stat_id IS NULL
timestamps             temperatures
datetime               stat_id datetime               tt
 …                       …      …                          …
 2004-11-03 23:00:00     ABO    2004-11-03 23:00:00        9.9
 2004-11-03 23:10:00     ABO    2004-11-03 23:10:00        7.8
 2004-11-03 23:20:00     NULL   NULL                       NULL
 2004-11-03 23:30:00     NULL   NULL                       NULL
 2004-11-03 23:40:00     NULL   NULL                       NULL
 2004-11-03 23:50:00     NULL   NULL                       NULL
 2004-11-04 00:00:00     NULL   NULL                       NULL
 2004-11-04 00:10:00     ABO    2004-11-04 00:10:00        9.2
 …                       …      …                          …


                                                           Beat Vontobel
                                                           CTO, MeteoNews AG
                                                           b.vontobel@meteonews.ch
                                                           http://seqengine.org
Filling sequence tables: „Manually“
    INSERT from an external loop (or a stored procedure)
•

    „explode“ a few rows using CROSS JOINs
•

        INSERT INTO i VALUES (1), (2), …, (8), (9), (10);
    ‣

        INSERT INTO j
    ‣
          SELECT u.i * 10 + v.i
          FROM i AS u
          CROSS JOIN i AS v;
    „Pop quiz: generate 1 million records“ (Giuseppe Maxia)
•
         http://datacharmer.blogspot.com/2007/12/pop-quiz-generate-1-
            million-records.html


                                                               Beat Vontobel
                                                               CTO, MeteoNews AG
                                                               b.vontobel@meteonews.ch
                                                               http://seqengine.org
…or, just use the SeqEngine

  -- http://seqengine.org
  -- README for build instructions

  INSTALL PLUGIN SeqEngine SONAME 'ha_seqengine.so';

  SHOW PLUGIN; SHOW ENGINES;

  CREATE TABLE million (i TIMESTAMP NOT NULL)
  ENGINE=SeqEngine CONNECTION=‘1;1000000;1‘;

  -- If you want to… now it‘s materialized (fast!)
  ALTER TABLE million ENGINE=MyISAM;




                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
Syntax
-- Variable parts are highlighted

CREATE TABLE table_name (
   column_name {INT|TIMESTAMP} NOT NULL [PRIMARY KEY]
) ENGINE=SeqEngine CONNECTION=‘start;end;increment‘;




                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
„Manually“ created: Disadvantages
    Wastes storage
•

    Wastes RAM (for caches or if ENGINE=MEMORY)
•

    Wastes I/O
•

    Wastes CPU (unnecessary overhead in code)
•

    Cumbersome to fill (especially if large)
•




                                                  Beat Vontobel
                                                  CTO, MeteoNews AG
                                                  b.vontobel@meteonews.ch
                                                  http://seqengine.org
SeqEngine: Disadvantages
    None
•

    …other than:
•

        It‘s (yet) just a really quick hack for this presentation
    ‣

        Contains ugly code and probably a lot of bugs
    ‣

        Coded in C++ by somebody who‘s never done C++
    ‣
        before
        Is not part of the core server – go build it yourself!
    ‣




                                                                 Beat Vontobel
                                                                 CTO, MeteoNews AG
                                                                 b.vontobel@meteonews.ch
                                                                 http://seqengine.org
Limitations of the SeqEngine (v0.1)
    Not real limitations, but due to the concept:
•

         Read-only
     ‣

         One column maximum
     ‣

         UNIQUE keys only
     ‣

    Current limitations:
•

         INT and TIMESTAMP only
     ‣

         Only full key reads
     ‣

         Error checking, clean-up, optimization, bugs…
     ‣



                                                         Beat Vontobel
                                                         CTO, MeteoNews AG
                                                         b.vontobel@meteonews.ch
                                                         http://seqengine.org
The tiny core of the SeqEngine: Init
int ha_seqengine::rnd_init(bool scan)
{
    DBUG_ENTER(quot;ha_seqengine::rnd_initquot;);

    rnd_cursor_pos = share->seq_def.seq_start;

    DBUG_RETURN(0);
}




                                                 Beat Vontobel
                                                 CTO, MeteoNews AG
                                                 b.vontobel@meteonews.ch
                                                 http://seqengine.org
The tiny core of the SeqEngine: Next row
int ha_seqengine::rnd_next(uchar *buf)
{
    DBUG_ENTER(quot;ha_seqengine::rnd_nextquot;);

    if(rnd_cursor_pos <= share->seq_def.seq_end) {
        build_row(buf, rnd_cursor_pos);
        rnd_cursor_pos += share->seq_def.seq_inc;
        table->status= 0;
        DBUG_RETURN(0);
    }

    table->status= STATUS_NOT_FOUND;
    DBUG_RETURN(HA_ERR_END_OF_FILE);
}



                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
SeqEngine: The BOF
    Using the Storage Engine API for small projects
•

    Additional questions/discussion
•

        Wednesday, April 22
    ‣

        20:30pm
    ‣

        Ballroom E
    ‣




                                                      Beat Vontobel
                                                      CTO, MeteoNews AG
                                                      b.vontobel@meteonews.ch
                                                      http://seqengine.org
Back to the missing rows example…
WHERE temperatures.stat_id IS NULL
timestamps             temperatures
datetime               stat_id datetime               tt
 …                       …      …                          …
 2004-11-03 23:00:00     ABO    2004-11-03 23:00:00        9.9
 2004-11-03 23:10:00     ABO    2004-11-03 23:10:00        7.8
 2004-11-03 23:20:00     NULL   NULL                       NULL
 2004-11-03 23:30:00     NULL   NULL                       NULL
 2004-11-03 23:40:00     NULL   NULL                       NULL
 2004-11-03 23:50:00     NULL   NULL                       NULL
 2004-11-04 00:00:00     NULL   NULL                       NULL
 2004-11-04 00:10:00     ABO    2004-11-04 00:10:00        9.2
 …                       …      …                          …


                                                           Beat Vontobel
                                                           CTO, MeteoNews AG
                                                           b.vontobel@meteonews.ch
                                                           http://seqengine.org
SeqEngine (LEFT JOIN)
SELECT    timestamps.datetime, stations.stat_id

FROM      timestamps CROSS JOIN stations

LEFT JOIN temperatures AS temps
ON        (temps.datetime, temps.stat_id)
            = (timestamps.datetime, stations.stat_id)

WHERE     stations.stat_id = 'ABO'
AND       temperatures.stat_id IS NULL;




                                                  Beat Vontobel
                                                  CTO, MeteoNews AG
                                                  b.vontobel@meteonews.ch
                                                  http://seqengine.org
SeqEngine (NOT EXISTS)
SELECT    timestamps.datetime, stations.stat_id

FROM      timestamps CROSS JOIN stations

WHERE     stations.stat_id = 'ABO'

AND NOT EXISTS (
     SELECT *
     FROM   temperatures AS temps
     WHERE (temps.datetime, temps.stat_id)
            = (timestamps.datetime, stations.stat_id)
);




                                                  Beat Vontobel
                                                  CTO, MeteoNews AG
                                                  b.vontobel@meteonews.ch
                                                  http://seqengine.org
Finding Holes… the naïve way
for(„all timestamps to check“) {

    /* Single SELECTs for every timestamp */
    db_query(„SELECT COUNT(*)
              FROM temperatures
              WHERE stat_id = ? AND datetime = ?“);

    if(„no row found“) {
       warn_about_missing_row(„timestamp“);
    }
}




                                                 Beat Vontobel
                                                 CTO, MeteoNews AG
                                                 b.vontobel@meteonews.ch
                                                 http://seqengine.org
As a Procedure (Single SELECTs)
CREATE PROCEDURE find_holes_naive(stat CHAR(3))
BEGIN
   DECLARE dt DATETIME DEFAULT '2004-01-01 00:00:00';
   DECLARE c INT;

   WHILE dt < '2005-01-01 00:00:00' DO
        SELECT COUNT(*) INTO c
        FROM   temperatures
        WHERE (stat_id, datetime) = (stat, dt);

       IF c = 0 THEN -- missing row
           SELECT stat, dt;
       END IF;

       SET dt = dt + INTERVAL 10 MINUTE;
   END WHILE;
END //
                                                  Beat Vontobel
                                                  CTO, MeteoNews AG
                                                  b.vontobel@meteonews.ch
                                                  http://seqengine.org
Finding Holes… the „standard“ way
/* Working with an    ordered set */
db_query(„SELECT      datetime
          FROM        temperatures
          WHERE       stat_id = ?
          ORDER BY    datetime ASC“);

for(„all timestamps to check“) {

    db_fetch_row();

    while(„timestamps don‘t match“) {
       warn_about_missing_row();
       increment_timestamp();
    }
}


                                        Beat Vontobel
                                        CTO, MeteoNews AG
                                        b.vontobel@meteonews.ch
                                        http://seqengine.org
As a Procedure (Ordered Set)
CREATE PROCEDURE find_holes_ordered(stat CHAR(3))
BEGIN
        DECLARE no_more_rows BOOLEAN DEFAULT FALSE;
        DECLARE dt1 DATETIME DEFAULT '2004-01-01 00:00:00';
        DECLARE dt2 DATETIME;

        DECLARE temperatures_cursor CURSOR FOR
             SELECT datetime FROM temperatures WHERE stat_id = stat ORDER BY datetime ASC;

        DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_rows = TRUE;

        OPEN temperatures_cursor;

        temperatures_loop: LOOP
                  FETCH temperatures_cursor INTO dt2;

                  WHILE dt1 != dt2 DO
                             SELECT stat, dt1;
                             SET dt1 = dt1 + INTERVAL 10 MINUTE;
                             IF dt1 >= '2005-01-01 00:00:00' THEN LEAVE temperatures_loop; END IF;
                  END WHILE;

                  SET dt1 = dt1 + INTERVAL 10 MINUTE;
                  IF dt1 >= '2005-01-01 00:00:00' THEN LEAVE temperatures_loop; END IF;

        END LOOP temperatures_loop;

        CLOSE temperatures_cursor;
END//




                                                                                                     Beat Vontobel
                                                                                                     CTO, MeteoNews AG
                                                                                                     b.vontobel@meteonews.ch
                                                                                                     http://seqengine.org
Self-Reference (LEFT self-JOIN)
SELECT    *
FROM      temperatures

LEFT JOIN temperatures AS missing

ON        temperatures.stat_id = missing.stat_id
AND       temperatures.datetime + INTERVAL 10 MINUTE
               = missing.datetime

WHERE     temperatures.stat_id = 'ABO'
AND       missing.datetime IS NULL;




                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
Self-Reference (NOT EXISTS)
SELECT *
FROM   temperatures

WHERE   NOT EXISTS (

            SELECT *
            FROM   temperatures AS missing
            WHERE missing.datetime
                       = temperatures.datetime
                         + INTERVAL 10 MINUTE
            AND    missing.stat_id
                       = temperatures.stat_id

        )

AND     stat_id = 'ABO';

                                                 Beat Vontobel
                                                 CTO, MeteoNews AG
                                                 b.vontobel@meteonews.ch
                                                 http://seqengine.org
What‘s the performance?
    SeqEngine
•

        LEFT JOIN
    ‣

        NOT EXISTS
    ‣

    Self-Reference
•

        LEFT self-JOIN
    ‣

        NOT EXISTS
    ‣

    Stored Procedures
•

        Naïve (Single SELECTs)
    ‣

        Standard (Ordered SET)
    ‣
                                 Beat Vontobel
                                 CTO, MeteoNews AG
                                 b.vontobel@meteonews.ch
                                 http://seqengine.org
The benchmark
All the usual disclaimers for benchmarks apply: Go ahead and measure it with your hardware, your version of MySQL,
your storage engines, your data sets and your server configuration settings.

       Query                                             Remarks                                       Time [s]
 1       SeqEngine (NOT EXISTS)                                                                           0.28

 2       SeqEngine (LEFT JOIN)                                                                            0.29

 3       Procedure (Ordered SET)                            result set per missing row                    0.59

 4       Self (NOT EXISTS)                                  only first missing row                        0.93

 5       Self (LEFT JOIN)                                   only first missing row                        1.10

 6       Procedure (Single SELECTs)                         result set per missing row                    2.80




                                                                                                           Beat Vontobel
                                                                                                           CTO, MeteoNews AG
                                                                                                           b.vontobel@meteonews.ch
                                                                                                           http://seqengine.org
The benchmark
All the usual disclaimers for benchmarks apply: Go ahead and measure it with your hardware, your version of MySQL,
your storage engines, your data sets and your server configuration settings.

      1. SeqEngine (NOT EXISTS)                          0.28s

         2. SeqEngine (LEFT JOIN)                         0.29s

     3. Procedure (Ordered SET)                                 0.59s

4. Self Reference (NOT EXISTS)                                         0.93s

   5. Self Reference (LEFT JOIN)                                          1.10s

  6. Procedure (Single SELECTs)                                                                              2.80s

                                                 0s       0.5s      1.0s       1.5s      2.0s      2.5s      3.0s

                                                                                                           Beat Vontobel
                                                                                                           CTO, MeteoNews AG
                                                                                                           b.vontobel@meteonews.ch
                                                                                                           http://seqengine.org
Lessons to be learned…
    The Sequence trick (and SeqEngine) worked
•

        It may sometimes pay off to go the extra mile and write a
    ‣
        custom storage engine!
    Stored PROCEDUREs with CURSORs sometimes
•
    can be damned fast!
    Subquery optimization really did progress in MySQL
•
    (at least in some parts, more to come with 6.0)
        Consider NOT EXISTS over LEFT JOIN
    ‣




                                                           Beat Vontobel
                                                           CTO, MeteoNews AG
                                                           b.vontobel@meteonews.ch
                                                           http://seqengine.org
2nd use case: Generate Test Data
mysql> CREATE TABLE large (i INT NOT NULL)
ENGINE=SeqEngine CONNECTION='1;10000000;1';

Query OK, 0 rows affected (0,12 sec)



mysql> ALTER TABLE large ENGINE=MyISAM;

Query OK, 10000000 rows affected (3,27 sec)
Records: 10000000 Duplicates: 0 Warnings: 0




                                              Beat Vontobel
                                              CTO, MeteoNews AG
                                              b.vontobel@meteonews.ch
                                              http://seqengine.org
Generating other Sequences from Integers
CREATE VIEW letters AS
   SELECT CHAR(i) FROM integer_sequence;

CREATE VIEW timestamps AS
   SELECT FROM_UNIXTIME(i) FROM integer_sequence;

CREATE VIEW squares AS
   SELECT i*i FROM integer_sequence;

   …




                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
Generate very large and complex data sets
INSERT INTO customers
SELECT      i                     AS customer_id,
            MD5(i)                AS customer_name,
            ROUND(RAND()*80+1920) AS customer_year
FROM        large;

SELECT * FROM customers;
+-------------+---------------------+---------------+
| customer_id | customer_name       | customer_year |
+-------------+---------------------+---------------+
|           1 | c4ca4238a0b9f75849… |          1935 |
|           2 | c81e728d9d4c2f636f… |          1967 |
              |                                     |
|                                   |
|           …|…                     |             …|
+-------------+---------------------+---------------+
10000000 rows in set
                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
„Salvage“ a bad design
One-to-Many gone wrong:

 Table `users`
+----------+--------+---------+---------+
| username | sel1   | sel2    | sel3    |
+----------+--------+---------+---------+
| john     | apple | orange | pear      |
| bill     | NULL   | NULL    | NULL    |
| emma     | banana | pear    | NULL    |
+----------+--------+---------+---------+




                                            Beat Vontobel
                                            CTO, MeteoNews AG
                                            b.vontobel@meteonews.ch
                                            http://seqengine.org
„Salvage“ a bad design
CREATE TABLE salvage (
    col INT NOT NULL
) ENGINE=SeqEngine CONNECTION='1;3;1';

+-----+
| col |
+-----+
|   1|
|   2|
|   3|
+-----+




                                         Beat Vontobel
                                         CTO, MeteoNews AG
                                         b.vontobel@meteonews.ch
                                         http://seqengine.org
„Multiply“ the rows with a cartesian JOIN
mysql> SELECT * FROM users CROSS JOIN salvage;

+----------+--------+--------+------+-----+
| username | sel1   | sel2   | sel3 | col |
+----------+--------+--------+------+-----+
| bill     | NULL   | NULL   | NULL |   1|
| bill     | NULL   | NULL   | NULL |   2|
| bill     | NULL   | NULL   | NULL |   3|
| emma     | banana | pear   | NULL |   3|
| emma     | banana | pear   | NULL |   1|
| emma     | banana | pear   | NULL |   2|
| john     | apple | orange | pear |    1|
| john     | apple | orange | pear |    2|
| john     | apple | orange | pear |    3|
+----------+--------+--------+------+-----+
9 rows in set (0,00 sec)

                                                 Beat Vontobel
                                                 CTO, MeteoNews AG
                                                 b.vontobel@meteonews.ch
                                                 http://seqengine.org
„Multiply“ the rows with a cartesian JOIN
mysql> SELECT * FROM users CROSS JOIN salvage;

+----------+--------+--------+------+-----+
| username | sel1   | sel2   | sel3 | col |
+----------+--------+--------+------+-----+
| bill     | NULL   | NULL   | NULL |   1|
| bill     | NULL   | NULL   | NULL |   2|
| bill     | NULL   | NULL   | NULL |   3|
| emma     | banana | pear   | NULL |   3|
| emma     | banana | pear   | NULL |   1|
| emma     | banana | pear   | NULL |   2|
| john     | apple | orange | pear |    1|
| john     | apple | orange | pear |    2|
| john     | apple | orange | pear |    3|
+----------+--------+--------+------+-----+
9 rows in set (0,00 sec)

                                                 Beat Vontobel
                                                 CTO, MeteoNews AG
                                                 b.vontobel@meteonews.ch
                                                 http://seqengine.org
Normalized on the fly
SELECT   username,
         CASE col WHEN 1 THEN sel1
                   WHEN 2 THEN sel2
                   WHEN 3 THEN sel3
         END AS sel
FROM     users CROSS JOIN salvage
HAVING   sel IS NOT NULL;

+----------+--------+
| username | sel    |
+----------+--------+
| john     | apple |
| emma     | banana |
| john     | orange |
| emma     | pear   |
| john     | pear   |
+----------+--------+
                                      Beat Vontobel
                                      CTO, MeteoNews AG
                                      b.vontobel@meteonews.ch
                                      http://seqengine.org
Comma-Separated Attribute Lists
mysql> DESCRIBE selections;
+------------+--------------+------+-----+---------+
| Field      | Type         | Null | Key | Default |
+------------+--------------+------+-----+---------+
| username   | varchar(5)   | NO   | PRI | NULL    |
| selections | varchar(255) | NO   |     | NULL    |
+------------+--------------+------+-----+---------+

mysql> SELECT * FROM selections;
+----------+-------------------+
| username | selections        |
+----------+-------------------+
| john     | apple,orange,pear |
| bill     |                   |
| emma     | banana,pear       |
+----------+-------------------+

                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
Querying Comma-Separated Attribute Lists
SELECT   username,

         SUBSTRING_INDEX(


            SUBSTRING_INDEX(

                 selections,
                 ',', i
            ),

             ',', -1
         ) AS selection
FROM     selections
JOIN     integers
HAVING   selection NOT LIKE '';

                                      Beat Vontobel
                                      CTO, MeteoNews AG
                                      b.vontobel@meteonews.ch
                                      http://seqengine.org
Querying Comma-Separated Attribute Lists
SELECT   username,
         -- Take last element
         SUBSTRING_INDEX(

            -- Crop list after element i
            SUBSTRING_INDEX(
                -- Add empty sentinel element
                CONCAT(selections, ','),
                ',', i
            ),

             ',', -1
         ) AS selection
FROM     selections
JOIN     integers
HAVING   selection NOT LIKE '';

                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
Querying Comma-Separated Attribute Lists
SELECT   username,
         -- Take last element
         SUBSTRING_INDEX(

            -- Crop list after element i
            SUBSTRING_INDEX(
                -- Add empty sentinel element
                CONCAT(selections, ','),
                ',', i
            ),

             ',', -1
         ) AS selection
FROM     selections
JOIN     integers
HAVING   selection NOT LIKE '';

                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
Querying Comma-Separated Attribute Lists
SELECT   username,
         -- Take last element
         SUBSTRING_INDEX(

            -- Crop list after element i
            SUBSTRING_INDEX(
                -- Add empty sentinel element
                CONCAT(selections, ','),
                ',', i
            ),

             ',', -1
         ) AS selection
FROM     selections
JOIN     integers
HAVING   selection NOT LIKE '';

                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
Counting members from attribute lists
SELECT   SUBSTRING_INDEX(
              SUBSTRING_INDEX(
                  CONCAT(selections, ','), ',', i
              ), ',', -1
         ) AS selection,
         COUNT(*)
FROM     selections JOIN integers
GROUP BY selection
HAVING   selection NOT LIKE '';
+-----------+----------+
| selection | COUNT(*) |
+-----------+----------+
| apple     |         1|
| banana    |         1|
| orange    |         1|
| pear      |         2|
+-----------+----------+
                                                    Beat Vontobel
                                                    CTO, MeteoNews AG
                                                    b.vontobel@meteonews.ch
                                                    http://seqengine.org
Problem: Variable-sized IN-Predicates
    Statements can‘t be prepared for variable-sized lists in the
•
    in clause:
        SELECT * FROM x WHERE a IN (?)
    ‣

    One needs:
•

        SELECT * FROM x WHERE a IN (?)
    ‣

        SELECT * FROM x WHERE a IN (?, ?)
    ‣

        SELECT * FROM x WHERE a IN (?, ?, ?, …)
    ‣

    Example from Stéphane Faroult: „The Art of SQL“
•
    adapted for MySQL


                                                           Beat Vontobel
                                                           CTO, MeteoNews AG
                                                           b.vontobel@meteonews.ch
                                                           http://seqengine.org
Split arguments as before!
SELECT     …
FROM       rental
INNER JOIN customer ON rental.customer_id = …
INNER JOIN address ON …
…
INNER JOIN (
  SELECT SUBSTRING_INDEX(
           SUBSTRING_INDEX(CONCAT(?, quot;,quot;), quot;,quot;, i),
           quot;,quot;, -1
         ) AS customer_id
  FROM   sequences.integers
  WHERE  i <= ?
) AS s ON rental.customer_id = s.customer_id
…
WHERE …;


                                                Beat Vontobel
                                                CTO, MeteoNews AG
                                                b.vontobel@meteonews.ch
                                                http://seqengine.org
SQL-String-Parsing beats Query-Parsing!
     Execution Times in Seconds for a different number of runs (lower is better)
                  20
                                                           18.1

                  16
                                                                  16.3
                                            12.1
                  12
                                                   10.9
                   8
                                6.0

                   4                  5.4
                  0.6
                   0
                        0.5
                        x




                                  0x




                                                0x




                                                               0x
                    00




                                 00




                                              00




                                                             00
                  10




                               10




                                             20




                                                            30
                   Prepared/Sequence               Client-side IN-List


                                                                              Beat Vontobel
                                                                              CTO, MeteoNews AG
                                                                              b.vontobel@meteonews.ch
                                                                              http://seqengine.org
Sequences and SeqEngine: Conclusion
    Use Sequences (and SeqEngine) to e.g.:
•

        Find missing rows
    ‣

        Generate test data
    ‣

        Pivot tables
    ‣

        Do clever-things with „for-Loops“ (String-Parsing etc.)
    ‣

    http://seqengine.org
•

        Slides will be available shortly after the presentation
    ‣
        (also on conference website)


                                                              Beat Vontobel
                                                              CTO, MeteoNews AG
                                                              b.vontobel@meteonews.ch
                                                              http://seqengine.org

Weitere ähnliche Inhalte

Mehr von MySQLConference

Memcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My SqlMemcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My SqlMySQLConference
 
Using Open Source Bi In The Real World
Using Open Source Bi In The Real WorldUsing Open Source Bi In The Real World
Using Open Source Bi In The Real WorldMySQLConference
 
Partitioning Under The Hood
Partitioning Under The HoodPartitioning Under The Hood
Partitioning Under The HoodMySQLConference
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsMySQLConference
 
Writing Efficient Java Applications For My Sql Cluster Using Ndbj
Writing Efficient Java Applications For My Sql Cluster Using NdbjWriting Efficient Java Applications For My Sql Cluster Using Ndbj
Writing Efficient Java Applications For My Sql Cluster Using NdbjMySQLConference
 
My Sql Performance On Ec2
My Sql Performance On Ec2My Sql Performance On Ec2
My Sql Performance On Ec2MySQLConference
 
Inno Db Performance And Usability Patches
Inno Db Performance And Usability PatchesInno Db Performance And Usability Patches
Inno Db Performance And Usability PatchesMySQLConference
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At CraigslistMySQLConference
 
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql BottlenecksUsing Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql BottlenecksMySQLConference
 
Make Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMake Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMySQLConference
 
Getting The Most Out Of My Sql Enterprise Monitor 20
Getting The Most Out Of My Sql Enterprise Monitor 20Getting The Most Out Of My Sql Enterprise Monitor 20
Getting The Most Out Of My Sql Enterprise Monitor 20MySQLConference
 
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service BackendWide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service BackendMySQLConference
 
Unleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business IntelligenceUnleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business IntelligenceMySQLConference
 
Inno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code StructureInno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code StructureMySQLConference
 
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin ExpressMy Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin ExpressMySQLConference
 

Mehr von MySQLConference (16)

Memcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My SqlMemcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My Sql
 
Using Open Source Bi In The Real World
Using Open Source Bi In The Real WorldUsing Open Source Bi In The Real World
Using Open Source Bi In The Real World
 
Partitioning Under The Hood
Partitioning Under The HoodPartitioning Under The Hood
Partitioning Under The Hood
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
 
Writing Efficient Java Applications For My Sql Cluster Using Ndbj
Writing Efficient Java Applications For My Sql Cluster Using NdbjWriting Efficient Java Applications For My Sql Cluster Using Ndbj
Writing Efficient Java Applications For My Sql Cluster Using Ndbj
 
My Sql Performance On Ec2
My Sql Performance On Ec2My Sql Performance On Ec2
My Sql Performance On Ec2
 
Inno Db Performance And Usability Patches
Inno Db Performance And Usability PatchesInno Db Performance And Usability Patches
Inno Db Performance And Usability Patches
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
 
The Smug Mug Tale
The Smug Mug TaleThe Smug Mug Tale
The Smug Mug Tale
 
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql BottlenecksUsing Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
 
Make Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMake Your Life Easier With Maatkit
Make Your Life Easier With Maatkit
 
Getting The Most Out Of My Sql Enterprise Monitor 20
Getting The Most Out Of My Sql Enterprise Monitor 20Getting The Most Out Of My Sql Enterprise Monitor 20
Getting The Most Out Of My Sql Enterprise Monitor 20
 
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service BackendWide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
 
Unleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business IntelligenceUnleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business Intelligence
 
Inno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code StructureInno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code Structure
 
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin ExpressMy Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
 

Kürzlich hochgeladen

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Solving Common Sql Problems With The Seq Engine

  • 1. Copyright © 2009 Beat Vontobel This work is made available under the Creative Commons Attribution-Noncommercial-Share Alike license, see http://creativecommons.org/licenses/by-nc-sa/3.0/ Solving Common SQL Problems with the SeqEngine Beat Vontobel, CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 2. Solving SQL Problems with the SeqEngine How to benefit from simple auxiliary tables holding • sequences Use of a pluggable storage engine to create such tables • On the side: • Some interesting benchmarks ‣ MySQL-Optimizer caveats ‣ Remember once more how to do things the „SQL-way“ ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 3. Sequences: What are we talking about? CREATE TABLE integer_sequence ( i INT NOT NULL PRIMARY KEY ); INSERT INTO integer_sequence (i) VALUES (1), (2), (3), (4), (5), (6), (7), (8); SELECT * FROM integer_sequence; +---+ |i| +---+ |1| |2| |3| |4| … Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 4. Names used by others… Pivot Table • Can be used to „pivot“ other tables („turn them around“) ‣ Integers Table • They often hold integers as data type ‣ Auxiliary/Utility Table • They help us solve problems, but contain no actual data ‣ Sequence Table • Just what it is: The name I‘ll use ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 5. What we‘re not talking about (1) -- Oracle Style Sequences -- -- (mostly used to generate primary keys, much -- like what MySQL‘s auto_increment feature is -- used for) CREATE SEQUENCE customers_seq START WITH 1000 INCREMENT BY 1; INSERT INTO customers (customer_id, name) VALUES (customers_seq.NEXTVAL, 'John Doe'); Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 6. What we‘re not talking about (2) Sequence in Mathematics: • „an ordered list of objects“ ‣ n-tuple ‣ Sequence Table in SQL: • a set, unordered by definition ‣ F = {n | 1 ≤ n ≤ 20; n is integer} relation (set of 1-tuples) ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 7. „Using such a utility table is a favorite old trick of experienced SQL developers“ (Stéphane Faroult: The Art of SQL) Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 8. Finding Holes… typically Swiss! Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 9. Finding Holes… in a Table! +---------+---------------------+------+--- | stat_id | datetime | tt |… +---------+---------------------+------+------+ |… … | … … | | | ABO | 2004-11-03 22:40:00 | 8.3 | … | ABO | 2004-11-03 22:50:00 | 8.7 | | ABO | 2004-11-03 23:00:00 | 9.9 | | ABO | 2004-11-03 23:10:00 | 7.8 | | ABO | 2004-11-04 00:10:00 | 9.2 | | ABO | 2004-11-04 00:20:00 | 9.1 | | ABO | 2004-11-04 00:30:00 | 10.2 | | ABO | 2004-11-04 00:40:00 | 9.3 | | | … |… … … | | | | +---------+---------------------+------+---- Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 10. Finding Holes… in a Table! +---------+---------------------+------+--- | stat_id | datetime | tt |… +---------+---------------------+------+------+ |… … | … … | | | ABO | 2004-11-03 22:40:00 | 8.3 | … | ABO | 2004-11-03 22:50:00 | 8.7 | | ABO | 2004-11-03 23:00:00 | 9.9 | | ABO | 2004-11-03 23:10:00 | 7.8 | | ABO | 2004-11-04 00:10:00 | 9.2 | | ABO | 2004-11-04 00:20:00 | 9.1 | | ABO | 2004-11-04 00:30:00 | 10.2 | | ABO | 2004-11-04 00:40:00 | 9.3 | | | … |… … … | | | | +---------+---------------------+------+---- Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 11. The table‘s create statement used for demo CREATE TABLE temperatures ( stat_id CHAR(3) NOT NULL, datetime TIMESTAMP NOT NULL, tt decimal(3,1) DEFAULT NULL, PRIMARY KEY (stat_id, datetime), UNIQUE KEY reverse_primary (datetime, stat_id) ); Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 12. How to „SELECT“ a row that doesn‘t exist? SELECT only returns rows that are there • WHERE only filters rows • We need something to generate rows! • Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 13. Finding Holes… the naïve way for(„all timestamps to check“) { /* Single SELECTs for every timestamp */ db_query(„SELECT COUNT(*) FROM temperatures WHERE stat_id = ? AND datetime = ?“); if(„no row found“) { warn_about_missing_row(„timestamp“); } } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 14. Finding Holes… the „standard“ way /* Working with an ordered set */ db_query(„SELECT datetime FROM temperatures WHERE stat_id = ? ORDER BY datetime ASC“); for(„all timestamps to check“) { db_fetch_row(); while(„timestamps don‘t match“) { warn_about_missing_row(); increment_timestamp(); } } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 15. These were just ordinary JOINs! for-Loop just walks an „imaginary“ timestamps table with a • sequence of all the values to check for! Then we LEFT JOIN these timestamps against our • temperatures or do a NOT EXIST subquery ‣ So, if we had a sequence table… • Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 16. Sequence of timestamps CREATE TABLE timestamps ( datetime TIMESTAMP NOT NULL PRIMARY KEY ); INSERT INTO timestamps (datetime) VALUES ('2004-01-01 00:00:00'), ('2004-01-01 00:00:10'), …; SELECT * FROM timestamps; +---------------------+ | datetime | +---------------------+ | 2004-01-01 00:00:00 | | 2004-01-01 00:00:10 | |… | Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 17. Queries using the sequence SELECT * FROM timestamps -- our „for-Loop“ LEFT JOIN temperatures ON timestamps.datetime = temperatures.datetime WHERE temperatures.datetime IS NULL; SELECT * FROM timestamps -- our „for-Loop“ WHERE NOT EXISTS ( SELECT * FROM temperatures WHERE temperatures.datetime = timestamps.datetime ); Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 18. Finding missing rows WHERE temperatures.stat_id IS NULL timestamps temperatures datetime stat_id datetime tt … … … … 2004-11-03 23:00:00 ABO 2004-11-03 23:00:00 9.9 2004-11-03 23:10:00 ABO 2004-11-03 23:10:00 7.8 2004-11-03 23:20:00 NULL NULL NULL 2004-11-03 23:30:00 NULL NULL NULL 2004-11-03 23:40:00 NULL NULL NULL 2004-11-03 23:50:00 NULL NULL NULL 2004-11-04 00:00:00 NULL NULL NULL 2004-11-04 00:10:00 ABO 2004-11-04 00:10:00 9.2 … … … … Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 19. Filling sequence tables: „Manually“ INSERT from an external loop (or a stored procedure) • „explode“ a few rows using CROSS JOINs • INSERT INTO i VALUES (1), (2), …, (8), (9), (10); ‣ INSERT INTO j ‣ SELECT u.i * 10 + v.i FROM i AS u CROSS JOIN i AS v; „Pop quiz: generate 1 million records“ (Giuseppe Maxia) • http://datacharmer.blogspot.com/2007/12/pop-quiz-generate-1- million-records.html Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 20. …or, just use the SeqEngine -- http://seqengine.org -- README for build instructions INSTALL PLUGIN SeqEngine SONAME 'ha_seqengine.so'; SHOW PLUGIN; SHOW ENGINES; CREATE TABLE million (i TIMESTAMP NOT NULL) ENGINE=SeqEngine CONNECTION=‘1;1000000;1‘; -- If you want to… now it‘s materialized (fast!) ALTER TABLE million ENGINE=MyISAM; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 21. Syntax -- Variable parts are highlighted CREATE TABLE table_name ( column_name {INT|TIMESTAMP} NOT NULL [PRIMARY KEY] ) ENGINE=SeqEngine CONNECTION=‘start;end;increment‘; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 22. „Manually“ created: Disadvantages Wastes storage • Wastes RAM (for caches or if ENGINE=MEMORY) • Wastes I/O • Wastes CPU (unnecessary overhead in code) • Cumbersome to fill (especially if large) • Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 23. SeqEngine: Disadvantages None • …other than: • It‘s (yet) just a really quick hack for this presentation ‣ Contains ugly code and probably a lot of bugs ‣ Coded in C++ by somebody who‘s never done C++ ‣ before Is not part of the core server – go build it yourself! ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 24. Limitations of the SeqEngine (v0.1) Not real limitations, but due to the concept: • Read-only ‣ One column maximum ‣ UNIQUE keys only ‣ Current limitations: • INT and TIMESTAMP only ‣ Only full key reads ‣ Error checking, clean-up, optimization, bugs… ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 25. The tiny core of the SeqEngine: Init int ha_seqengine::rnd_init(bool scan) { DBUG_ENTER(quot;ha_seqengine::rnd_initquot;); rnd_cursor_pos = share->seq_def.seq_start; DBUG_RETURN(0); } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 26. The tiny core of the SeqEngine: Next row int ha_seqengine::rnd_next(uchar *buf) { DBUG_ENTER(quot;ha_seqengine::rnd_nextquot;); if(rnd_cursor_pos <= share->seq_def.seq_end) { build_row(buf, rnd_cursor_pos); rnd_cursor_pos += share->seq_def.seq_inc; table->status= 0; DBUG_RETURN(0); } table->status= STATUS_NOT_FOUND; DBUG_RETURN(HA_ERR_END_OF_FILE); } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 27. SeqEngine: The BOF Using the Storage Engine API for small projects • Additional questions/discussion • Wednesday, April 22 ‣ 20:30pm ‣ Ballroom E ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 28. Back to the missing rows example… WHERE temperatures.stat_id IS NULL timestamps temperatures datetime stat_id datetime tt … … … … 2004-11-03 23:00:00 ABO 2004-11-03 23:00:00 9.9 2004-11-03 23:10:00 ABO 2004-11-03 23:10:00 7.8 2004-11-03 23:20:00 NULL NULL NULL 2004-11-03 23:30:00 NULL NULL NULL 2004-11-03 23:40:00 NULL NULL NULL 2004-11-03 23:50:00 NULL NULL NULL 2004-11-04 00:00:00 NULL NULL NULL 2004-11-04 00:10:00 ABO 2004-11-04 00:10:00 9.2 … … … … Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 29. SeqEngine (LEFT JOIN) SELECT timestamps.datetime, stations.stat_id FROM timestamps CROSS JOIN stations LEFT JOIN temperatures AS temps ON (temps.datetime, temps.stat_id) = (timestamps.datetime, stations.stat_id) WHERE stations.stat_id = 'ABO' AND temperatures.stat_id IS NULL; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 30. SeqEngine (NOT EXISTS) SELECT timestamps.datetime, stations.stat_id FROM timestamps CROSS JOIN stations WHERE stations.stat_id = 'ABO' AND NOT EXISTS ( SELECT * FROM temperatures AS temps WHERE (temps.datetime, temps.stat_id) = (timestamps.datetime, stations.stat_id) ); Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 31. Finding Holes… the naïve way for(„all timestamps to check“) { /* Single SELECTs for every timestamp */ db_query(„SELECT COUNT(*) FROM temperatures WHERE stat_id = ? AND datetime = ?“); if(„no row found“) { warn_about_missing_row(„timestamp“); } } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 32. As a Procedure (Single SELECTs) CREATE PROCEDURE find_holes_naive(stat CHAR(3)) BEGIN DECLARE dt DATETIME DEFAULT '2004-01-01 00:00:00'; DECLARE c INT; WHILE dt < '2005-01-01 00:00:00' DO SELECT COUNT(*) INTO c FROM temperatures WHERE (stat_id, datetime) = (stat, dt); IF c = 0 THEN -- missing row SELECT stat, dt; END IF; SET dt = dt + INTERVAL 10 MINUTE; END WHILE; END // Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 33. Finding Holes… the „standard“ way /* Working with an ordered set */ db_query(„SELECT datetime FROM temperatures WHERE stat_id = ? ORDER BY datetime ASC“); for(„all timestamps to check“) { db_fetch_row(); while(„timestamps don‘t match“) { warn_about_missing_row(); increment_timestamp(); } } Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 34. As a Procedure (Ordered Set) CREATE PROCEDURE find_holes_ordered(stat CHAR(3)) BEGIN DECLARE no_more_rows BOOLEAN DEFAULT FALSE; DECLARE dt1 DATETIME DEFAULT '2004-01-01 00:00:00'; DECLARE dt2 DATETIME; DECLARE temperatures_cursor CURSOR FOR SELECT datetime FROM temperatures WHERE stat_id = stat ORDER BY datetime ASC; DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_rows = TRUE; OPEN temperatures_cursor; temperatures_loop: LOOP FETCH temperatures_cursor INTO dt2; WHILE dt1 != dt2 DO SELECT stat, dt1; SET dt1 = dt1 + INTERVAL 10 MINUTE; IF dt1 >= '2005-01-01 00:00:00' THEN LEAVE temperatures_loop; END IF; END WHILE; SET dt1 = dt1 + INTERVAL 10 MINUTE; IF dt1 >= '2005-01-01 00:00:00' THEN LEAVE temperatures_loop; END IF; END LOOP temperatures_loop; CLOSE temperatures_cursor; END// Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 35. Self-Reference (LEFT self-JOIN) SELECT * FROM temperatures LEFT JOIN temperatures AS missing ON temperatures.stat_id = missing.stat_id AND temperatures.datetime + INTERVAL 10 MINUTE = missing.datetime WHERE temperatures.stat_id = 'ABO' AND missing.datetime IS NULL; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 36. Self-Reference (NOT EXISTS) SELECT * FROM temperatures WHERE NOT EXISTS ( SELECT * FROM temperatures AS missing WHERE missing.datetime = temperatures.datetime + INTERVAL 10 MINUTE AND missing.stat_id = temperatures.stat_id ) AND stat_id = 'ABO'; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 37. What‘s the performance? SeqEngine • LEFT JOIN ‣ NOT EXISTS ‣ Self-Reference • LEFT self-JOIN ‣ NOT EXISTS ‣ Stored Procedures • Naïve (Single SELECTs) ‣ Standard (Ordered SET) ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 38. The benchmark All the usual disclaimers for benchmarks apply: Go ahead and measure it with your hardware, your version of MySQL, your storage engines, your data sets and your server configuration settings. Query Remarks Time [s] 1 SeqEngine (NOT EXISTS) 0.28 2 SeqEngine (LEFT JOIN) 0.29 3 Procedure (Ordered SET) result set per missing row 0.59 4 Self (NOT EXISTS) only first missing row 0.93 5 Self (LEFT JOIN) only first missing row 1.10 6 Procedure (Single SELECTs) result set per missing row 2.80 Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 39. The benchmark All the usual disclaimers for benchmarks apply: Go ahead and measure it with your hardware, your version of MySQL, your storage engines, your data sets and your server configuration settings. 1. SeqEngine (NOT EXISTS) 0.28s 2. SeqEngine (LEFT JOIN) 0.29s 3. Procedure (Ordered SET) 0.59s 4. Self Reference (NOT EXISTS) 0.93s 5. Self Reference (LEFT JOIN) 1.10s 6. Procedure (Single SELECTs) 2.80s 0s 0.5s 1.0s 1.5s 2.0s 2.5s 3.0s Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 40. Lessons to be learned… The Sequence trick (and SeqEngine) worked • It may sometimes pay off to go the extra mile and write a ‣ custom storage engine! Stored PROCEDUREs with CURSORs sometimes • can be damned fast! Subquery optimization really did progress in MySQL • (at least in some parts, more to come with 6.0) Consider NOT EXISTS over LEFT JOIN ‣ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 41. 2nd use case: Generate Test Data mysql> CREATE TABLE large (i INT NOT NULL) ENGINE=SeqEngine CONNECTION='1;10000000;1'; Query OK, 0 rows affected (0,12 sec) mysql> ALTER TABLE large ENGINE=MyISAM; Query OK, 10000000 rows affected (3,27 sec) Records: 10000000 Duplicates: 0 Warnings: 0 Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 42. Generating other Sequences from Integers CREATE VIEW letters AS SELECT CHAR(i) FROM integer_sequence; CREATE VIEW timestamps AS SELECT FROM_UNIXTIME(i) FROM integer_sequence; CREATE VIEW squares AS SELECT i*i FROM integer_sequence; … Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 43. Generate very large and complex data sets INSERT INTO customers SELECT i AS customer_id, MD5(i) AS customer_name, ROUND(RAND()*80+1920) AS customer_year FROM large; SELECT * FROM customers; +-------------+---------------------+---------------+ | customer_id | customer_name | customer_year | +-------------+---------------------+---------------+ | 1 | c4ca4238a0b9f75849… | 1935 | | 2 | c81e728d9d4c2f636f… | 1967 | | | | | | …|… | …| +-------------+---------------------+---------------+ 10000000 rows in set Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 44. „Salvage“ a bad design One-to-Many gone wrong: Table `users` +----------+--------+---------+---------+ | username | sel1 | sel2 | sel3 | +----------+--------+---------+---------+ | john | apple | orange | pear | | bill | NULL | NULL | NULL | | emma | banana | pear | NULL | +----------+--------+---------+---------+ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 45. „Salvage“ a bad design CREATE TABLE salvage ( col INT NOT NULL ) ENGINE=SeqEngine CONNECTION='1;3;1'; +-----+ | col | +-----+ | 1| | 2| | 3| +-----+ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 46. „Multiply“ the rows with a cartesian JOIN mysql> SELECT * FROM users CROSS JOIN salvage; +----------+--------+--------+------+-----+ | username | sel1 | sel2 | sel3 | col | +----------+--------+--------+------+-----+ | bill | NULL | NULL | NULL | 1| | bill | NULL | NULL | NULL | 2| | bill | NULL | NULL | NULL | 3| | emma | banana | pear | NULL | 3| | emma | banana | pear | NULL | 1| | emma | banana | pear | NULL | 2| | john | apple | orange | pear | 1| | john | apple | orange | pear | 2| | john | apple | orange | pear | 3| +----------+--------+--------+------+-----+ 9 rows in set (0,00 sec) Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 47. „Multiply“ the rows with a cartesian JOIN mysql> SELECT * FROM users CROSS JOIN salvage; +----------+--------+--------+------+-----+ | username | sel1 | sel2 | sel3 | col | +----------+--------+--------+------+-----+ | bill | NULL | NULL | NULL | 1| | bill | NULL | NULL | NULL | 2| | bill | NULL | NULL | NULL | 3| | emma | banana | pear | NULL | 3| | emma | banana | pear | NULL | 1| | emma | banana | pear | NULL | 2| | john | apple | orange | pear | 1| | john | apple | orange | pear | 2| | john | apple | orange | pear | 3| +----------+--------+--------+------+-----+ 9 rows in set (0,00 sec) Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 48. Normalized on the fly SELECT username, CASE col WHEN 1 THEN sel1 WHEN 2 THEN sel2 WHEN 3 THEN sel3 END AS sel FROM users CROSS JOIN salvage HAVING sel IS NOT NULL; +----------+--------+ | username | sel | +----------+--------+ | john | apple | | emma | banana | | john | orange | | emma | pear | | john | pear | +----------+--------+ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 49. Comma-Separated Attribute Lists mysql> DESCRIBE selections; +------------+--------------+------+-----+---------+ | Field | Type | Null | Key | Default | +------------+--------------+------+-----+---------+ | username | varchar(5) | NO | PRI | NULL | | selections | varchar(255) | NO | | NULL | +------------+--------------+------+-----+---------+ mysql> SELECT * FROM selections; +----------+-------------------+ | username | selections | +----------+-------------------+ | john | apple,orange,pear | | bill | | | emma | banana,pear | +----------+-------------------+ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 50. Querying Comma-Separated Attribute Lists SELECT username, SUBSTRING_INDEX( SUBSTRING_INDEX( selections, ',', i ), ',', -1 ) AS selection FROM selections JOIN integers HAVING selection NOT LIKE ''; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 51. Querying Comma-Separated Attribute Lists SELECT username, -- Take last element SUBSTRING_INDEX( -- Crop list after element i SUBSTRING_INDEX( -- Add empty sentinel element CONCAT(selections, ','), ',', i ), ',', -1 ) AS selection FROM selections JOIN integers HAVING selection NOT LIKE ''; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 52. Querying Comma-Separated Attribute Lists SELECT username, -- Take last element SUBSTRING_INDEX( -- Crop list after element i SUBSTRING_INDEX( -- Add empty sentinel element CONCAT(selections, ','), ',', i ), ',', -1 ) AS selection FROM selections JOIN integers HAVING selection NOT LIKE ''; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 53. Querying Comma-Separated Attribute Lists SELECT username, -- Take last element SUBSTRING_INDEX( -- Crop list after element i SUBSTRING_INDEX( -- Add empty sentinel element CONCAT(selections, ','), ',', i ), ',', -1 ) AS selection FROM selections JOIN integers HAVING selection NOT LIKE ''; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 54. Counting members from attribute lists SELECT SUBSTRING_INDEX( SUBSTRING_INDEX( CONCAT(selections, ','), ',', i ), ',', -1 ) AS selection, COUNT(*) FROM selections JOIN integers GROUP BY selection HAVING selection NOT LIKE ''; +-----------+----------+ | selection | COUNT(*) | +-----------+----------+ | apple | 1| | banana | 1| | orange | 1| | pear | 2| +-----------+----------+ Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 55. Problem: Variable-sized IN-Predicates Statements can‘t be prepared for variable-sized lists in the • in clause: SELECT * FROM x WHERE a IN (?) ‣ One needs: • SELECT * FROM x WHERE a IN (?) ‣ SELECT * FROM x WHERE a IN (?, ?) ‣ SELECT * FROM x WHERE a IN (?, ?, ?, …) ‣ Example from Stéphane Faroult: „The Art of SQL“ • adapted for MySQL Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 56. Split arguments as before! SELECT … FROM rental INNER JOIN customer ON rental.customer_id = … INNER JOIN address ON … … INNER JOIN ( SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(CONCAT(?, quot;,quot;), quot;,quot;, i), quot;,quot;, -1 ) AS customer_id FROM sequences.integers WHERE i <= ? ) AS s ON rental.customer_id = s.customer_id … WHERE …; Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 57. SQL-String-Parsing beats Query-Parsing! Execution Times in Seconds for a different number of runs (lower is better) 20 18.1 16 16.3 12.1 12 10.9 8 6.0 4 5.4 0.6 0 0.5 x 0x 0x 0x 00 00 00 00 10 10 20 30 Prepared/Sequence Client-side IN-List Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org
  • 58. Sequences and SeqEngine: Conclusion Use Sequences (and SeqEngine) to e.g.: • Find missing rows ‣ Generate test data ‣ Pivot tables ‣ Do clever-things with „for-Loops“ (String-Parsing etc.) ‣ http://seqengine.org • Slides will be available shortly after the presentation ‣ (also on conference website) Beat Vontobel CTO, MeteoNews AG b.vontobel@meteonews.ch http://seqengine.org