SlideShare ist ein Scribd-Unternehmen logo
1 von 97
Downloaden Sie, um offline zu lesen
Is there a perfect
data-parallel
programming language?
Experiments with Morel and Apache Calcite
Julian Hyde @julianhyde (Looker/Google)
March 7th, 2022
Agenda
Programming language
types (and how they might be
unified)
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
Agenda
1. Data parallel
2. Functional
3. Functional + Relational
4. Functional + Data parallel
5. Functional + Deductive
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
Agenda
1. Data parallel
2. Functional
3. Functional + Relational
4. Functional + Data parallel
5. Functional + Deductive
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
Agenda
1. Data parallel
2. Functional
3. Functional + Relational
4. Functional + Data parallel
5. Functional + Deductive
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
Agenda
1. Data parallel
2. Functional
3. Functional + Relational
4. Functional + Data parallel
5. Functional + Deductive
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
Agenda
1. Data parallel
2. Functional
3. Functional + Relational
4. Functional + Data parallel
5. Functional + Deductive
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
Agenda
1. Data parallel
2. Functional
3. Functional + Relational
4. Functional + Data parallel
5. Functional + Deductive
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
1. Data parallel
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
Explained via two data parallel
problems (text indexing and
WordCount)
Text files
(input)
. .
.
Text files
(input)
Index files
(output)
Building a document index
a b c x y z
x y z
split split split
reduce reduce
reduce reduce
Text files
(input)
Index files
(output)
Building a document index
a b c
yellow: 12,
zebra: 9
WordCount
split split split
reduce reduce
reduce reduce
Text files
(input)
Word counts
(output)
aardvark: 3,
badger: 7
... ...
Data-parallel programming
Batch processing
Input is a large, immutable data sets
Processing can be processed in parallel pipelines
Significant chance of hardware failure during execution
Related programming models:
● Stream and incremental processing
● Deductive and graph programming
Data-parallel programming
Data-parallel
framework
(e.g.
MapReduce,
FlumeJava,
Apache Spark)
(*) You write two small functions
fun wc_mapper line =
List.map (fn w => (w, 1)) (split line);
fun wc_reducer (key, values) =
List.foldl (fn (x, y) => x + y) 0 values;
(*) The framework provides mapReduce
fun mapReduce mapper reducer list = ...;
(*) Combine them to build your program
fun wordCount list = mapReduce wc_mapper wc_reducer list;
Data-parallel programming
Data-parallel
framework
(e.g.
MapReduce,
FlumeJava,
Apache Spark)
(*) You write two small functions
fun wc_mapper line =
List.map (fn w => (w, 1)) (split line);
fun wc_reducer (key, values) =
List.foldl (fn (x, y) => x + y) 0 values;
(*) The framework provides mapReduce
fun mapReduce mapper reducer list = ...;
(*) Combine them to build your program
fun wordCount list = mapReduce wc_mapper wc_reducer list;
SQL SELECT word, COUNT(*) AS c
FROM Documents AS d,
LATERAL TABLE (split(d.text)) AS word // requires a ‘split’ UDF
GROUP BY word;
Data-parallel programming
Data-parallel
framework
(e.g.
MapReduce,
FlumeJava,
Apache Spark)
(*) You write two small functions
fun wc_mapper line =
List.map (fn w => (w, 1)) (split line);
fun wc_reducer (key, values) =
List.foldl (fn (x, y) => x + y) 0 values;
(*) The framework provides mapReduce
fun mapReduce mapper reducer list = ...;
(*) Combine them to build your program
fun wordCount list = mapReduce wc_mapper wc_reducer list;
SQL SELECT word, COUNT(*) AS c
FROM Documents AS d,
LATERAL TABLE (split(d.text)) AS word // requires a ‘split’ UDF
GROUP BY word;
Morel from line in lines,
word in split line (*) requires ‘split’ function - see later...
group word compute c = count
What are our options?
Extend SQL Extend a functional programming
language
We’ll need to:
● Allow functions defined in
queries
● Add relations-as-values
● Add functions-as-values
● Modernize the type system
(adding type variables, function
types, algebraic types)
● Write an optimizing compiler
We’ll need to:
● Add syntax for relational operations
● Map onto external data
● Write a query optimizer
Nice stuff we get for free:
● Algebraic types
● Pattern matching
● Inline function and value declarations
Morel is a functional programming language. It is
derived from Standard ML, and is extended with
list comprehensions and other relational
operators. Like Standard ML, Morel has
parametric and algebraic data types with
Hindley-Milner type inference. Morel is
implemented in Java, and is optimized and
executed using a combination of techniques from
functional programming language compilers and
database query optimizers.
Evolution of functional
languages
Standard
ML
OCaml
Haskell
Scala
ML
Java
F#
C#
Lisp
Scheme
Clojure
1958
1975
1973
1983
1995
1995
1990
2000
2004
2005
2007
Extend Standard ML
Standard ML is strongly typed, and
can very frequently deduce every
type in a program.
Standard ML has record types.
(These are important for queries.)
Haven’t decided whether Morel is
eager (like Standard ML) or lazy
(like Haskell and SQL)
Haskell’s type system is more
powerful. So, Haskell programs
often have explicit types.
Standard
ML
Haskell
Scala
ML
Java
F#
C#
Morel
SQL
Lisp
Scheme
Clojure
OCaml
1975
2019
Morel themes
Early stage language
The target audience is SQL users
Less emphasis on abstract algebra
Program compilation, query optimization
Functional programming in-the-small, and in-the-large
2. Functional
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
A quick introduction to
Standard ML
All of the following examples work in both Standard ML
and Morel.
Standard ML: values and types
- "Hello, world!";
val it = "Hello, world!" : string
- 1 + 2;
val it = 3 : int
- ~1.5;
val it = ~1.5 : real
- [1, 1, 2, 3, 5];
val it = [1,1,2,3,5] : int list
- fn i => i mod 2 = 1;
val it = fn : int -> bool
- (1, "a");
val it = (1,"a") : int * string
- {name = "Fred", empno = 100};
val it = {empno=100,name="Fred"} : {empno:int, name:string}
Standard ML: variables and functions
- val x = 1;
val x = 1 : int
- val isOdd = fn i => i mod 2 = 1;
val isOdd = fn : int -> bool
- fun isOdd i = i mod 2 = 0;
val isOdd = fn : int -> bool
- isOdd x;
val it = true : bool
- let
= val x = 6
= fun isOdd i = i mod 2 = 1
= in
= isOdd x
= end;
val it = false : bool
val assigns a value to a variable.
fun declares a function.
● fun is syntactic sugar for
val ... = fn ... => ...
let allows you to make several declarations
before evaluating an expression.
- datatype 'a tree =
= EMPTY
= | LEAF of 'a
= | NODE of ('a * 'a tree * 'a tree);
Algebraic data types, case, and recursion
datatype declares an algebraic data type. 'a Is a
type variable and therefore 'a tree is a
polymorphic type.
- datatype 'a tree =
= EMPTY
= | LEAF of 'a
= | NODE of ('a * 'a tree * 'a tree);
- val t =
= NODE (1, LEAF 2, NODE (3, EMPTY, LEAF 7));
EMPTY
Algebraic data types, case, and recursion
datatype declares an algebraic data type. 'a Is a
type variable and therefore 'a tree is a
polymorphic type.
Define an instance of tree using its constructors
NODE, LEAF and EMPTY.
LEAF
2
NODE
1
NODE
3
LEAF
7
- datatype 'a tree =
= EMPTY
= | LEAF of 'a
= | NODE of ('a * 'a tree * 'a tree);
- val t =
= NODE (1, LEAF 2, NODE (3, EMPTY, LEAF 7));
- val rec sumTree = fn t =>
= case t of EMPTY => 0
= | LEAF i => i
= | NODE (i, l, r) =>
= i + sumTree l + sumTree r;
val sumTree = fn : int tree -> int
- sumTree t;
val it = 13 : int
Algebraic data types, case, and recursion
datatype declares an algebraic data type. 'a Is a
type variable and therefore 'a tree is a
polymorphic type.
Define an instance of tree using its constructors
NODE, LEAF and EMPTY.
case matches patterns,
deconstructing data types, and
binding variables as it goes.
sumTree use case and calls
itself recursively.
EMPTY
LEAF
2
NODE
1
NODE
3
LEAF
7
- datatype 'a tree =
= EMPTY
= | LEAF of 'a
= | NODE of ('a * 'a tree * 'a tree);
- val t =
= NODE (1, LEAF 2, NODE (3, EMPTY, LEAF 7));
- val rec sumTree = fn t =>
= case t of EMPTY => 0
= | LEAF i => i
= | NODE (i, l, r) =>
= i + sumTree l + sumTree r;
val sumTree = fn : int tree -> int
- sumTree t;
val it = 13 : int
- fun sumTree EMPTY = 0
= | sumTree (LEAF i) = i
= | sumTree (NODE (i, l, r)) =
= i + sumTree l + sumTree r;
val sumTree = fn : int tree -> int
Algebraic data types, case, and recursion
datatype declares an algebraic data type. 'a Is a
type variable and therefore 'a tree is a
polymorphic type.
Define an instance of tree using its constructors
NODE, LEAF and EMPTY.
case matches patterns,
deconstructing data types, and
binding variables as it goes.
sumTree use case and calls
itself recursively.
EMPTY
LEAF
2
NODE
1
NODE
3
LEAF
7
fun is a shorthand for
case and val rec.
Relations and higher-order functions
Relations are lists of records.
A higher-order function is a function whose
arguments or return value are functions.
Common higher-order functions that act on
lists:
● List.filter – equivalent to the Filter
relational operator (SQL WHERE)
● List.map – equivalent to Project
relational operator (SQL SELECT)
● flatMap – as List.map, but may
produce several output elements per
input element
#deptno is a system-generated function that
accesses the deptno field of a record.
- val emps = [
= {id = 100, name = "Fred", deptno = 10},
= {id = 101, name = "Velma", deptno = 20},
= {id = 102, name = "Shaggy", deptno = 30},
= {id = 103, name = "Scooby", deptno = 30}];
- List.filter (fn e => #deptno e = 30) emps;
val it = [{deptno=30,id=102,name="Shaggy"},
{deptno=30,id=103,name="Scooby"}]
: {deptno:int, id:int, name:string} list
SELECT *
FROM emps AS e
WHERE e.deptno = 30
Equivalent SQL
Implementing Join using higher-order functions
List.map function is equivalent to
Project relational operator (SQL
SELECT)
We also define a flatMap function.
- val depts = [
= {deptno = 10, name = "Sales"},
= {deptno = 20, name = "Marketing"},
= {deptno = 30, name = "R&D"}];
- fun flatMap f xs = List.concat (List.map f xs);
val flatMap =
fn : ('a -> 'b list) -> 'a list -> 'b list
- List.map
= (fn (d, e) => {deptno = #deptno d, name = #name e})
= (List.filter
= (fn (d, e) => #deptno d = #deptno e)
= (flatMap
= (fn e => (List.map (fn d => (d, e)) depts))
= emps));
val it =
[{deptno=10,name="Fred"}, {deptno=20,name="Velma"},
{deptno=30,name="Shaggy"}, {deptno=30,name="Scooby"}]
: {deptno:int, name:string} list
SELECT d.deptno, e.name
FROM emps AS e,
depts AS d
WHERE d.deptno = e.deptno
Equivalent SQL
3. Functional +
Relational
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
Morel
List.map
(fn (d, e) => {deptno = #deptno d, name = #name e})
(List.filter
(fn (d, e) => #deptno d = #deptno e)
(flatMap
(fn e => (List.map (fn d => (d, e)) depts))
emps));
Implementing Join in Morel using from
Morel extensions to Standard ML:
● from operator creates a list
comprehension
● x.field is shorthand for
#field x
● {#field} is shorthand for
{field = #field x}
SELECT d.deptno, e.name
FROM emps AS e,
depts AS d
WHERE d.deptno = e.deptno
Equivalent SQL
List.map
(fn (d, e) => {deptno = #deptno d, name = #name e})
(List.filter
(fn (d, e) => #deptno d = #deptno e)
(flatMap
(fn e => (List.map (fn d => (d, e)) depts))
emps));
from e in emps,
d in depts
where #deptno e = #deptno d
yield {deptno = #deptno d, name = #name e};
Implementing Join in Morel using from
Morel extensions to Standard ML:
● from operator creates a list
comprehension
● x.field is shorthand for
#field x
● {#field} is shorthand for
{field = #field x}
SELECT d.deptno, e.name
FROM emps AS e,
depts AS d
WHERE d.deptno = e.deptno
Equivalent SQL
List.map
(fn (d, e) => {deptno = #deptno d, name = #name e})
(List.filter
(fn (d, e) => #deptno d = #deptno e)
(flatMap
(fn e => (List.map (fn d => (d, e)) depts))
emps));
from e in emps,
d in depts
where #deptno e = #deptno d
yield {deptno = #deptno d, name = #name e};
from e in emps,
d in depts
where e.deptno = d.deptno
yield {d.deptno, e.name};
Implementing Join in Morel using from
Morel extensions to Standard ML:
● from operator creates a list
comprehension
● x.field is shorthand for
#field x
● {#field} is shorthand for
{field = #field x}
SELECT d.deptno, e.name
FROM emps AS e,
depts AS d
WHERE d.deptno = e.deptno
Equivalent SQL
WordCount
let
fun split0 [] word words = word :: words
| split0 (#" " :: s) word words = split0 s "" (word :: words)
| split0 (c :: s) word words = split0 s (word ^ (String.str c)) words
fun split = List.rev (split0 (String.explode s) "" [])
in
from line in lines,
word in split line
group word compute c = count
end;
WordCount
- let
= fun split0 [] word words = word :: words
= | split0 (#" " :: s) word words = split0 s "" (word :: words)
= | split0 (c :: s) word words = split0 s (word ^ (String.str c)) words
= fun split = List.rev (split0 (String.explode s) "" [])
= in
= from line in lines,
= word in split line
= group word compute c = count
= end;
val wordCount = fn : string list -> {c:int, word:string} list
- wordCount ["a skunk sat on a stump",
= "and thunk the stump stunk",
= "but the stump thunk the skunk stunk"];
val it =
[{c=2,word="a"},{c=3,word="the"},{c=1,word="but"},
{c=1,word="sat"},{c=1,word="and"},{c=2,word="stunk"},
{c=3,word="stump"},{c=1,word="on"},{c=2,word="thunk"},
{c=2,word="skunk"}] : {c:int, word:string} list
- let
= fun split0 [] word words = word :: words
= | split0 (#" " :: s) word words = split0 s "" (word :: words)
= | split0 (c :: s) word words = split0 s (word ^ (String.str c)) words
= fun split = List.rev (split0 (String.explode s) "" [])
= in
= from line in lines,
= word in split line
= group word compute c = count
= end;
val wordCount = fn : string list -> {c:int, word:string} list
- wordCount ["a skunk sat on a stump",
= "and thunk the stump stunk",
= "but the stump thunk the skunk stunk"];
val it =
[{c=2,word="a"},{c=3,word="the"},{c=1,word="but"},
{c=1,word="sat"},{c=1,word="and"},{c=2,word="stunk"},
{c=3,word="stump"},{c=1,word="on"},{c=2,word="thunk"},
{c=2,word="skunk"}] : {c:int, word:string} list
WordCount
Functional
programming
“in the small”
WordCount
- let
= fun split0 [] word words = word :: words
= | split0 (#" " :: s) word words = split0 s "" (word :: words)
= | split0 (c :: s) word words = split0 s (word ^ (String.str c)) words
= fun split = List.rev (split0 (String.explode s) "" [])
= in
= from line in lines,
= word in split line
= group word compute c = count
= end;
val wordCount = fn : string list -> {c:int, word:string} list
- wordCount ["a skunk sat on a stump",
= "and thunk the stump stunk",
= "but the stump thunk the skunk stunk"];
val it =
[{c=2,word="a"},{c=3,word="the"},{c=1,word="but"},
{c=1,word="sat"},{c=1,word="and"},{c=2,word="stunk"},
{c=3,word="stump"},{c=1,word="on"},{c=2,word="thunk"},
{c=2,word="skunk"}] : {c:int, word:string} list
Functional
programming
“in the large”
Functions as views, functions as values
- fun emps2 () =
= from e in emps
= yield {e.id,
= e.name,
= e.deptno,
= comp = fn revenue => case e.deptno of
= 30 => e.id + revenue / 2
= | _ => e.id};
val emps2 = fn : unit -> {comp:int -> int,
deptno:int, id:int, name:string} list
Functions as views, functions as values
emps2 is a function
that returns a
collection
- fun emps2 () =
= from e in emps
= yield {e.id,
= e.name,
= e.deptno,
= comp = fn revenue => case e.deptno of
= 30 => e.id + revenue / 2
= | _ => e.id};
val emps2 = fn : unit -> {comp:int -> int,
deptno:int, id:int, name:string} list
- fun emps2 () =
= from e in emps
= yield {e.id,
= e.name,
= e.deptno,
= comp = fn revenue => case e.deptno of
= 30 => e.id + revenue / 2
= | _ => e.id};
val emps2 = fn : unit -> {comp:int -> int,
deptno:int, id:int, name:string} list
Functions as views, functions as values
The comp field is a
function value (in
fact, it’s a closure)
emps2 is a function
that returns a
collection
Functions as views, functions as values
- fun emps2 () =
= from e in emps
= yield {e.id,
= e.name,
= e.deptno,
= comp = fn revenue => case e.deptno of
= 30 => e.id + revenue / 2
= | _ => e.id};
val emps2 = fn : unit -> {comp:int -> int,
deptno:int, id:int, name:string} list
- fun emps2 () =
= from e in emps
= yield {e.id,
= e.name,
= e.deptno,
= comp = fn revenue => case e.deptno of
= 30 => e.id + revenue / 2
= | _ => e.id};
val emps2 = fn : unit -> {comp:int -> int,
deptno:int, id:int, name:string} list
- from e in emps2 ()
= yield {e.name, e.id, c = e.comp 1000};
The comp field is a
function value (in
fact, it’s a closure)
emps2 is a function
that returns a
collection
Functions as views, functions as values
- fun emps2 () =
= from e in emps
= yield {e.id,
= e.name,
= e.deptno,
= comp = fn revenue => case e.deptno of
= 30 => e.id + revenue / 2
= | _ => e.id};
val emps2 = fn : unit -> {comp:int -> int,
deptno:int, id:int, name:string} list
- from e in emps2 ()
= yield {e.name, e.id, c = e.comp 1000};
val it =
[{c=100,id=100,name="Fred"},
{c=101,id=101,name="Velma"},
{c=602,id=102,name="Shaggy"},
{c=603,id=103,name="Scooby"}]
: {c:int, id:int, name:string} list
Chaining relational operators
- from e in emps
= order e.deptno, e.id desc
= yield {e.name, nameLength = String.size e.name, e.id, e.deptno}
= where nameLength > 4
= group deptno compute c = count, s = sum of nameLength
= where s > 10
= yield c + s;
val it = [14] : int list
Chaining relational operators - step 1
- from e in emps;
val it =
[{deptno=10,id=100,name="Fred"},
{deptno=20,id=101,name="Velma"},
{deptno=30,id=102,name="Shaggy"},
{deptno=30,id=103,name="Scooby"}]
: {deptno:int, id:int, name:string} list
Chaining relational operators - step 2
- from e in emps
= order e.deptno, e.id desc;
val it =
[{deptno=10,id=100,name="Fred"},
{deptno=20,id=101,name="Velma"},
{deptno=30,id=103,name="Scooby"},
{deptno=30,id=102,name="Shaggy"}]
: {deptno:int, id:int, name:string} list
Chaining relational operators - step 3
- from e in emps
= order e.deptno, e.id desc
= yield {e.name, nameLength = String.size e.name, e.id, e.deptno};
val it =
[{deptno=10,id=100,name="Fred",nameLength=4},
{deptno=20,id=101,name="Velma",nameLength=5},
{deptno=30,id=103,name="Scooby",nameLength=6},
{deptno=30,id=102,name="Shaggy",nameLength=6}]
: {deptno:int, id:int, name:string, nameLength:int} list
Chaining relational operators - step 4
- from e in emps
= order e.deptno, e.id desc
= yield {e.name, nameLength = String.size e.name, e.id, e.deptno}
= where nameLength > 4;
val it =
[{deptno=20,id=101,name="Velma",nameLength=5},
{deptno=30,id=103,name="Scooby",nameLength=6},
{deptno=30,id=102,name="Shaggy",nameLength=6}]
: {deptno:int, id:int, name:string, nameLength:int} list
Chaining relational operators - step 5
- from e in emps
= order e.deptno, e.id desc
= yield {e.name, nameLength = String.size e.name, e.id, e.deptno}
= where nameLength > 4
= group deptno compute c = count, s = sum of nameLength;
val it =
[{c=1,deptno=20,s=5},
{c=2,deptno=30,s=12}]
: {c:int, deptno:int, s:int} list
Chaining relational operators - step 6
- from e in emps
= order e.deptno, e.id desc
= yield {e.name, nameLength = String.size e.name, e.id, e.deptno}
= where nameLength > 4
= group deptno compute c = count, s = sum of nameLength
= where s > 10;
val it =
[{c=2,deptno=30,s=12}]
: {c:int, deptno:int, s:int} list
Chaining relational operators
= from e in emps
= order e.deptno, e.id desc
= yield {e.name, nameLength = String.size e.name, e.id, e.deptno}
= where nameLength > 4
= group deptno compute c = count, s = sum of nameLength
= where s > 10
= yield c + s;
val it = [14] : int list
Chaining relational operators
Morel from e in emps
order e.deptno, e.id desc
yield {e.name, nameLength = String.size e.name, e.id, e.deptno}
where nameLength > 4
group deptno compute c = count, s = sum of nameLength
where s > 10
yield c + s;
SQL (almost
equivalent)
SELECT c + s
FROM (SELECT deptno, COUNT(*) AS c, SUM(nameLength) AS s
FROM (SELECT e.name, CHAR_LENGTH(e.ename) AS nameLength, e.id, e.deptno
FROM (SELECT *
FROM emps AS e
ORDER BY e.deptno, e.id DESC))
WHERE nameLength > 4
GROUP BY deptno
HAVING s > 10)
Java (very
approximately)
for (Emp e : emps) {
String name = e.name;
int nameLength = name.length();
int id = e.id;
int deptno = e.deptno;
if (nameLength > 4) {
...
4. Functional +
Data parallel
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
Integrating Morel with Apache
Calcite
Apache Calcite
Apache Calcite Server
JDBC server
JDBC client
SQL parser &
validator
Query
planner
Adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
Physical
operators
Storage
Relational
algebra
Toolkit for writing a DBMS
Many parts are optional or
pluggable
Relational algebra is the
core
SELECT d.name, COUNT(*) AS c
FROM Emps AS e
JOIN Depts AS d USING (deptno)
WHERE e.age > 50
GROUP BY d.deptno
HAVING COUNT(*) > 5
ORDER BY c DESC
Relational algebra
Based on set theory, plus
operators: Project, Filter,
Aggregate, Union, Join, Sort
Calcite provides:
● SQL to relational algebra
● Query planner
● Physical operators to execute
plan
● An adapter system to make
external data sources look
like tables
Scan [Emps] Scan [Depts]
Join [e.deptno = d.deptno]
Filter [e.age > 50]
Aggregate [deptno, COUNT(*) AS c]
Filter [c > 5]
Project [name, c]
Sort [c DESC]
Algebraic rewrite
Scan [Emps] Scan [Depts]
Join [e.deptno = d.deptno]
Filter [e.age > 50]
Aggregate [deptno, COUNT(*) AS c]
Filter [c > 5]
Project [name, c]
Sort [c DESC]
Calcite optimizes queries by
applying rewrite rules that
preserve semantics.
Planner uses dynamic
programming, seeking the
lowest total cost.
SELECT d.name, COUNT(*) AS c
FROM (SELECT * FROM Emps
WHERE e.age > 50) AS e
JOIN Depts AS d USING (deptno)
GROUP BY d.deptno
HAVING COUNT(*) > 5
ORDER BY c DESC
Integration with Apache Calcite - schemas
Expose Calcite schemas as named
records, each field of which is a table.
A default Morel connection has
variables foodmart and scott
(connections to hsqldb via Calcite’s
JDBC adapter).
Connections to other Calcite
adapters (Apache Cassandra, Druid,
Kafka, Pig, Redis) are also possible.
- foodmart;
val it = {account=<relation>, currency=<relation>,
customer=<relation>, ...} : ...;
- scott;
val it = {bonus=<relation>, dept=<relation>,
emp=<relation>, salgrade=<relation>} : ...;
- scott.dept;
val it =
[{deptno=10,dname="ACCOUNTING",loc="NEW YORK"},
{deptno=20,dname="RESEARCH",loc="DALLAS"},
{deptno=30,dname="SALES",loc="CHICAGO"},
{deptno=40,dname="OPERATIONS",loc="BOSTON"}]
: {deptno:int, dname:string, loc:string} list
- from d in scott.dept
= where notExists (from e in scott.emp
= where e.deptno = d.deptno
= andalso e.job = "CLERK");
val it =
[{deptno=40,dname="OPERATIONS",loc="BOSTON"}]
: {deptno:int, dname:string, loc:string} list
- Sys.set ("hybrid", true);
val it = () : unit
- from d in scott.dept
= where notExists (from e in scott.emp
= where e.deptno = d.deptno
= andalso e.job = "CLERK");
val it = [{deptno=40,dname="OPERATIONS",loc="BOSTON"}]
: {deptno:int, dname:string, loc:string} list
Integration with Apache Calcite - relational algebra
In “hybrid” mode, Morel’s compiler identifies
sections of Morel programs that are
relational operations and converts them to
Calcite relational algebra.
Calcite can them optimize these and
execute them on their native systems.
- Sys.set ("hybrid", true);
val it = () : unit
- from d in scott.dept
= where notExists (from e in scott.emp
= where e.deptno = d.deptno
= andalso e.job = "CLERK");
val it = [{deptno=40,dname="OPERATIONS",loc="BOSTON"}]
: {deptno:int, dname:string, loc:string} list
- Sys.plan();
val it = "calcite(plan
LogicalProject(deptno=[$0], dname=[$1], loc=[$2])
LogicalFilter(condition=[IS NULL($4)])
LogicalJoin(condition=[=($0, $3)], joinType=[left])
LogicalProject(deptno=[$0], dname=[$1], loc=[$2])
JdbcTableScan(table=[[scott, DEPT]])
LogicalProject(deptno=[$0], $f1=[true])
LogicalAggregate(group=[{0}])
LogicalProject(deptno=[$1])
LogicalFilter(condition=[AND(=($5, 'CLERK'), IS NOT NULL($1))])
LogicalProject(comm=[$6], deptno=[$7], empno=[$0],ename=[$1], hiredate=[$4], job=[$2]
JdbcTableScan(table=[[scott, EMP]]))" : string
Integration with Apache Calcite - relational algebra
In “hybrid” mode, Morel’s compiler identifies
sections of Morel programs that are
relational operations and converts them to
Calcite relational algebra.
Calcite can them optimize these and
execute them on their native systems.
Optimization
Relational query optimization
● Applies to relational operators (~10 core
operators + UDFs) not general-purpose
code
● Transformation rules that match patterns
(e.g. Filter on Project)
● Decisions based on cost & statistics
● Hybrid plans - choose target engine, sort
order
● Materialized views
● Macro-level optimizations
Functional program optimization
● Applies to typed lambda calculus
● Inline constant lambda expressions
● Eliminate dead code
● Defend against mutually recursive groups
● Careful to not inline expressions that are
used more than once (increasing code
size) or which bind closures more often
(increasing running time)
● Micro-level optimizations
[GM] “The Volcano Optimizer Generator: Extensibility and Efficient Search” (Graefe, McKenna 1993)
[PJM] “Secrets of the Glasgow Haskell Compiler inliner” (Peyton Jones, Marlow 2002)
WordCount Data parallel Relational
(Apache Calcite)
Functional
(Morel)
Concisely specified in a functional
programming language (Morel)
Translated into relational algebra (Apache
Calcite)
Executed in the data-parallel framework of
your choice
from line in lines,
word in split line
group word compute c = count;
5. Functional +
Deductive
Data parallel
(e.g. Flume,
Spark)
Relational
(e.g. SQL)
Functional
(e.g. Haskell,
Standard ML,
Morel)
Deductive
(e.g. Datalog)
An elegant way to recursively
define relations in a functional
programming language*
*Designed but not yet implemented
# SQL
SELECT *
FROM parents
WHERE parent = ‘elrond’;
parent child
====== =======
elrond arwen
elrond elladan
elrond elrohir
# Morel
from (parent, child) in parents
where parent = “elrond”;
[(“elrond”, “arwen”),
(“elrond”, “elladan”),
(“elrond”, “elrohir”)];
# Datalog
is_parent(earendil, elrond).
is_parent(elrond, arwen).
is_parent(elrond, elladan).
is_parent(elrond, elrohir).
answer(X) :- is_parent(elrond, X).
X = arwen
X = elladan
X = elrohir
Parents (a base relation)
# SQL
CREATE VIEW ancestors AS
WITH RECURSIVE a AS (
SELECT parent AS ancestor,
child AS descendant
FROM parents
UNION ALL
SELECT a.ancestor, p.child
FROM parents AS p
JOIN a ON a.descendant = p.parent)
SELECT * FROM a;
SELECT * FROM ancestors
WHERE descendant = ‘arwen’;
ancestor descendant
======== =======
earendil arwen
elrond arwen
# Morel
fun ancestors () =
(from (x, y) in parents)
union
(from (x, y) in parents,
(y2, z) in ancestors ()
where y = y2
yield (x, z));
from (ancestor, descendant) in ancestors ()
where descendant = “arwen”;
Uncaught exception: StackOverflowError
# Datalog
is_ancestor(X, Y) :- is_parent(X, Y).
is_ancestor(X, Y) :- is_parent(X, Z),
is_ancestor(Z, Y).
answer(X) :- is_ancestor(X, arwen).
Ancestors (a recursively-defined relation)
# SQL
CREATE VIEW ancestors AS
WITH RECURSIVE a AS (
SELECT parent AS ancestor,
child AS descendant
FROM parents
UNION ALL
SELECT a.ancestor, p.child
FROM parents AS p
JOIN a ON a.descendant = p.parent)
SELECT * FROM a;
SELECT * FROM ancestors
WHERE descendant = ‘arwen’;
ancestor descendant
======== =======
earendil arwen
elrond arwen
# Morel
fun ancestors () =
(from (x, y) in parents)
union
(from (x, y) in parents,
(y2, z) in ancestors ()
where y = y2
yield (x, z));
from (ancestor, descendant) in ancestors ()
where descendant = “arwen”;
Uncaught exception: StackOverflowError
# Datalog
is_ancestor(X, Y) :- is_parent(X, Y).
is_ancestor(X, Y) :- is_parent(X, Z),
is_ancestor(Z, Y).
answer(X) :- is_ancestor(X, arwen).
Ancestors (a recursively-defined relation)
“StackOverflowError”
is not easy to fix. It’s
hard to a write a
recursive function that
iterates until a set
reaches a fixed point.
# Morel “forwards” relation
# Relation defined using algebra
- fun clerks () =
= from e in emps
= where e.job = “CLERK”;
# Query uses regular iteration
- from e in clerks,
= d in depts
= where d.deptno = e.deptno
= andalso d.loc = “DALLAS”
= yield e.name;
val it =
[“SMITH”, “ADAMS”] : string list;
# Morel “backwards” relation
# Relation defined using a predicate
- fun isClerk e =
= e.job = “CLERK”;
# Query uses a mixture of constrained
# and regular iteration
- from e suchThat isClerk e,
= d in depts
= where d.deptno = e.deptno
= andalso d.loc = “DALLAS”
= yield e.name;
val it =
[“SMITH”, “ADAMS”] : string list;
Two ways to define a relation
# Morel “forwards” relation
# Relation defined using algebra
- fun clerks () =
= from e in emps
= where e.job = “CLERK”;
# Query uses regular iteration
- from e in clerks,
= d in depts
= where d.deptno = e.deptno
= andalso d.loc = “DALLAS”
= yield e.name;
val it =
[“SMITH”, “ADAMS”] : string list;
# Morel “backwards” relation
# Relation defined using a predicate
- fun isClerk e =
= e.job = “CLERK”;
# Query uses a mixture of constrained
# and regular iteration
- from e suchThat isClerk e,
= d in depts
= where d.deptno = e.deptno
= andalso d.loc = “DALLAS”
= yield e.name;
val it =
[“SMITH”, “ADAMS”] : string list;
Two ways to define a relation
“suchThat” keyword
converts a bool
function into a relation
# Morel
- fun isAncestor (x, z) =
= (x, z) elem parents
= orelse exists (
= from y suchThat isAncestor (x, y)
= andalso (y, z) elem parents);
- from a suchThat isAncestor (a, “arwen”);
val it = [“earendil”, “elrond”] : string list
- from d suchThat isAncestor (“earendil”, d);
val it = [“elrond”, “arwen”, “elladan”, “elrohir”] : string list
Recursively-defined predicate relation
Morel
Functional query language
Rich type system, concise as SQL, Turing complete
Combines (relational) query optimization with
(lambda) FP optimization techniques
Execute in Java interpreter and/or data-parallel
engines from e in emps,
d in depts
where e.deptno = d.deptno
yield {d.deptno, e.name};
from line in lines,
word in split line
group word compute c = count;
from a suchThat isAncestor (a, “arwen”);
Thank you!
Questions?
Julian Hyde, Looker/Google • @julianhyde
https://github.com/hydromatic/morel • @morel_lang
https://calcite.apache.org • @ApacheCalcite
Extra slides
Other operators
from in
where
yield
order … desc
group … compute
count max min sum
intersect union except
exists notExists elem notElem only
Compared to other languages
Haskell – Haskell comprehensions are more general (monads vs lists). Morel is
focused on relational algebra and probably benefits from a simpler type system.
Builder APIs (e.g. LINQ, FlumeJava, Cascading, Apache Spark) – Builder APIs
are two languages. E.g. Apache Spark programs are Scala that builds an
algebraic expression. Scala code (especially lambdas) is sprinkled throughout the
algebra. Scala compilation is not integrated with algebra optimization.
SQL – SQL’s type system does not have parameterized types, so higher order
functions are awkward. Tables and columns have separate namespaces, which
complicates handling of nested collections. Functions and temporary variables
cannot be defined inside queries, so queries are not Turing-complete (unless you
use recursive query gymnastics).
Standard ML: types
Type Example
Primitive types bool
char
int
real
string
unit
true: bool
#"a": char
~1: int
3.14: real
"foo": string
(): unit
Function types string -> int
int * int -> int
String.size
fn (x, y) => x + y * y
Tuple types int * string (10, "Fred")
Record types {empno:int, name:string} {empno=10, name="Fred"}
Collection types int list
(bool * (int -> int)) list
[1, 2, 3]
[(true, fn i => i + 1)]
Type variables 'a List.length: 'a list -> int
Functional programming ↔ relational programming
Functional
programming
in-the-small
- fun squareList [] = []
= | squareList (x :: xs) = x * x :: squareList xs;
val squareList = fn : int list -> int list
- squareList [1, 2, 3];
val it = [1,4,9] : int list
Functional
programming
in-the-large
- fun squareList xs = List.map (fn x => x * x) xs;
val squareList = fn : int list -> int list
- squareList [1, 2, 3];
val it = [1,4,9] : int list
Relational
programming
- fun squareList xs =
= from x in xs
= yield x * x;
- squareList [1, 2, 3];
val it = [1,4,9] : int list
wordCount again
wordCount
in-the-small
- fun wordCount list = ...;
val wordCount = fn : string list -> {count:int, word:string} list
wordCount
in-the-large using
mapReduce
- fun mapReduce mapper reducer list = ...;
val mapReduce = fn : ('a -> ('b * 'c) list) ->
('b * 'c list -> 'd) -> 'a list -> ('b * 'd) list
- fun wc_mapper line =
= List.map (fn w => (w, 1)) (split line);
val wc_mapper = fn : string -> (string * int) list
- fun wc_reducer (key, values) =
= List.foldl (fn (x, y) => x + y) 0 values;
val wc_reducer = fn : 'a * int list -> int
- fun wordCount list = mapReduce wc_mapper wc_reducer list;
val wordCount = fn : string list -> {count:int, word:string} list
Relational
implementation of
mapReduce
- fun mapReduce mapper reducer list =
= from e in list,
= (k, v) in mapper e
= group k compute c = (fn vs => reducer (k, vs)) of v;
group …. compute
- fun median reals = ...;
val median = fn : real list -> real
- from e in emps
= group x = e.deptno mod 3,
= e.job
= compute c = count,
= sum of e.sal,
= m = median of e.sal + e.comm;
val it = {c:int, job:string, m:real, sum:real, x.int} list
https://github.com/hydromatic/morel/discussions/106
Join the discussion!
LucidDB
C++
Calcite evolution - origins as an SMP DB
JDBC server
JDBC client
Physical
operators
Rewrite rules
Catalog
Storage & data
structures
SQL parser &
validator
Query
planner
Relational
algebra
Java
Optiq
Calcite evolution - pluggable components
JDBC server
JDBC client
Physical
operators
Rewrite rules
SQL parser &
validator
Query
planner
Relational
algebra
Optiq
Calcite evolution - pluggable components
JDBC server
JDBC client
SQL parser &
validator
Query
planner
Adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
Physical
operators
Storage
Relational
algebra
Apache Calcite
Calcite evolution - separate JDBC stack
Avatica
JDBC server
JDBC client
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
ODBC client
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
Apache Calcite
Calcite evolution - federation via adapters
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
SQL
Calcite evolution - federation via adapters
Apache Calcite
JDBC adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Enumerable
adapter
MongoDB
adapter
File adapter
(CSV, JSON, Http)
Apache Kafka
adapter
Apache Spark
adapter
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra
Calcite evolution - federation via adapters
Apache Calcite
Pluggable
rewrite rules
Pluggable
stats / cost
Enumerable
adapter
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra
Calcite evolution - federation via adapters
Apache Calcite
JDBC adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra
Apache Calcite
Calcite evolution - SQL dialects
Pluggable
rewrite rules
Pluggable parser, lexical,
conformance, operators
Pluggable
SQL dialect
SQL
SQL
SQL parser &
validator
Query
planner
Relational
algebra
JDBC adapter
Apache Calcite
Calcite evolution - other front-end languages
SQL
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
Calcite evolution - other front-end languages
Pig
RelBuilder
Adapter
Physical
operators
Morel
Storage
Query
planner
Relational
algebra
Datalog
SQL parser &
validator
SQL

Weitere ähnliche Inhalte

Was ist angesagt?

Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching moduleSander Timmer
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factorskrishna singh
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in RFlorian Uhlitz
 
Python for R Users
Python for R UsersPython for R Users
Python for R UsersAjay Ohri
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply FunctionSakthi Dasans
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with RShareThis
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data StructureSakthi Dasans
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Guy Lebanon
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 

Was ist angesagt? (20)

Python Pandas
Python PandasPython Pandas
Python Pandas
 
R language introduction
R language introductionR language introduction
R language introduction
 
Language R
Language RLanguage R
Language R
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Sparklyr
SparklyrSparklyr
Sparklyr
 
R language
R languageR language
R language
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Pandas
PandasPandas
Pandas
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
R programming language
R programming languageR programming language
R programming language
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 

Ähnlich wie Is there a perfect data-parallel programming language? (Experiments with Morel and Apache Calcite)

Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Martin Odersky
 
Introduction to Functional Languages
Introduction to Functional LanguagesIntroduction to Functional Languages
Introduction to Functional Languagessuthi
 
Scala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love GameScala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love GameAntony Stubbs
 
A brief introduction to lisp language
A brief introduction to lisp languageA brief introduction to lisp language
A brief introduction to lisp languageDavid Gu
 
Cypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semanticsCypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semanticsopenCypher
 
Presentation on use of r statistics
Presentation on use of r statisticsPresentation on use of r statistics
Presentation on use of r statisticsKrishna Dhakal
 
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...Hamidreza Soleimani
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine LearningLuciano Resende
 
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of TonguesChoose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of TonguesCHOOSE
 
Using Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPUUsing Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPUSkills Matter
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
Basic and logical implementation of r language
Basic and logical implementation of r language Basic and logical implementation of r language
Basic and logical implementation of r language Md. Mahedi Mahfuj
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 

Ähnlich wie Is there a perfect data-parallel programming language? (Experiments with Morel and Apache Calcite) (20)

Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 
Introduction to Functional Languages
Introduction to Functional LanguagesIntroduction to Functional Languages
Introduction to Functional Languages
 
Perl_Part6
Perl_Part6Perl_Part6
Perl_Part6
 
Easy R
Easy REasy R
Easy R
 
Scala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love GameScala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love Game
 
A brief introduction to lisp language
A brief introduction to lisp languageA brief introduction to lisp language
A brief introduction to lisp language
 
Cypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semanticsCypher.PL: an executable specification of Cypher semantics
Cypher.PL: an executable specification of Cypher semantics
 
Special topics in finance lecture 2
Special topics in finance   lecture 2Special topics in finance   lecture 2
Special topics in finance lecture 2
 
Practical cats
Practical catsPractical cats
Practical cats
 
Presentation on use of r statistics
Presentation on use of r statisticsPresentation on use of r statistics
Presentation on use of r statistics
 
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...
Architecting Scalable Platforms in Erlang/OTP | Hamidreza Soleimani | Diginex...
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of TonguesChoose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
 
Mule Expression language
Mule Expression languageMule Expression language
Mule Expression language
 
Using Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPUUsing Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPU
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Basic and logical implementation of r language
Basic and logical implementation of r language Basic and logical implementation of r language
Basic and logical implementation of r language
 
Scala for curious
Scala for curiousScala for curious
Scala for curious
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 

Mehr von Julian Hyde

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteJulian Hyde
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Julian Hyde
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're IncubatingJulian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteJulian Hyde
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesJulian Hyde
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databasesJulian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and FastJulian Hyde
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 

Mehr von Julian Hyde (20)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Efficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databasesEfficient spatial queries on vanilla databases
Efficient spatial queries on vanilla databases
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 

Kürzlich hochgeladen

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 

Kürzlich hochgeladen (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 

Is there a perfect data-parallel programming language? (Experiments with Morel and Apache Calcite)

  • 1. Is there a perfect data-parallel programming language? Experiments with Morel and Apache Calcite Julian Hyde @julianhyde (Looker/Google) March 7th, 2022
  • 2. Agenda Programming language types (and how they might be unified) Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog)
  • 3. Agenda 1. Data parallel 2. Functional 3. Functional + Relational 4. Functional + Data parallel 5. Functional + Deductive Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog)
  • 4. Agenda 1. Data parallel 2. Functional 3. Functional + Relational 4. Functional + Data parallel 5. Functional + Deductive Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog)
  • 5. Agenda 1. Data parallel 2. Functional 3. Functional + Relational 4. Functional + Data parallel 5. Functional + Deductive Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog)
  • 6. Agenda 1. Data parallel 2. Functional 3. Functional + Relational 4. Functional + Data parallel 5. Functional + Deductive Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog)
  • 7. Agenda 1. Data parallel 2. Functional 3. Functional + Relational 4. Functional + Data parallel 5. Functional + Deductive Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog)
  • 8. Agenda 1. Data parallel 2. Functional 3. Functional + Relational 4. Functional + Data parallel 5. Functional + Deductive Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog)
  • 9.
  • 10. 1. Data parallel Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog) Explained via two data parallel problems (text indexing and WordCount)
  • 12. . . . Text files (input) Index files (output) Building a document index a b c x y z
  • 13. x y z split split split reduce reduce reduce reduce Text files (input) Index files (output) Building a document index a b c
  • 14. yellow: 12, zebra: 9 WordCount split split split reduce reduce reduce reduce Text files (input) Word counts (output) aardvark: 3, badger: 7 ... ...
  • 15. Data-parallel programming Batch processing Input is a large, immutable data sets Processing can be processed in parallel pipelines Significant chance of hardware failure during execution Related programming models: ● Stream and incremental processing ● Deductive and graph programming
  • 16. Data-parallel programming Data-parallel framework (e.g. MapReduce, FlumeJava, Apache Spark) (*) You write two small functions fun wc_mapper line = List.map (fn w => (w, 1)) (split line); fun wc_reducer (key, values) = List.foldl (fn (x, y) => x + y) 0 values; (*) The framework provides mapReduce fun mapReduce mapper reducer list = ...; (*) Combine them to build your program fun wordCount list = mapReduce wc_mapper wc_reducer list;
  • 17. Data-parallel programming Data-parallel framework (e.g. MapReduce, FlumeJava, Apache Spark) (*) You write two small functions fun wc_mapper line = List.map (fn w => (w, 1)) (split line); fun wc_reducer (key, values) = List.foldl (fn (x, y) => x + y) 0 values; (*) The framework provides mapReduce fun mapReduce mapper reducer list = ...; (*) Combine them to build your program fun wordCount list = mapReduce wc_mapper wc_reducer list; SQL SELECT word, COUNT(*) AS c FROM Documents AS d, LATERAL TABLE (split(d.text)) AS word // requires a ‘split’ UDF GROUP BY word;
  • 18. Data-parallel programming Data-parallel framework (e.g. MapReduce, FlumeJava, Apache Spark) (*) You write two small functions fun wc_mapper line = List.map (fn w => (w, 1)) (split line); fun wc_reducer (key, values) = List.foldl (fn (x, y) => x + y) 0 values; (*) The framework provides mapReduce fun mapReduce mapper reducer list = ...; (*) Combine them to build your program fun wordCount list = mapReduce wc_mapper wc_reducer list; SQL SELECT word, COUNT(*) AS c FROM Documents AS d, LATERAL TABLE (split(d.text)) AS word // requires a ‘split’ UDF GROUP BY word; Morel from line in lines, word in split line (*) requires ‘split’ function - see later... group word compute c = count
  • 19. What are our options? Extend SQL Extend a functional programming language We’ll need to: ● Allow functions defined in queries ● Add relations-as-values ● Add functions-as-values ● Modernize the type system (adding type variables, function types, algebraic types) ● Write an optimizing compiler We’ll need to: ● Add syntax for relational operations ● Map onto external data ● Write a query optimizer Nice stuff we get for free: ● Algebraic types ● Pattern matching ● Inline function and value declarations
  • 20. Morel is a functional programming language. It is derived from Standard ML, and is extended with list comprehensions and other relational operators. Like Standard ML, Morel has parametric and algebraic data types with Hindley-Milner type inference. Morel is implemented in Java, and is optimized and executed using a combination of techniques from functional programming language compilers and database query optimizers.
  • 22. Extend Standard ML Standard ML is strongly typed, and can very frequently deduce every type in a program. Standard ML has record types. (These are important for queries.) Haven’t decided whether Morel is eager (like Standard ML) or lazy (like Haskell and SQL) Haskell’s type system is more powerful. So, Haskell programs often have explicit types. Standard ML Haskell Scala ML Java F# C# Morel SQL Lisp Scheme Clojure OCaml 1975 2019
  • 23. Morel themes Early stage language The target audience is SQL users Less emphasis on abstract algebra Program compilation, query optimization Functional programming in-the-small, and in-the-large
  • 24.
  • 25. 2. Functional Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog) A quick introduction to Standard ML All of the following examples work in both Standard ML and Morel.
  • 26. Standard ML: values and types - "Hello, world!"; val it = "Hello, world!" : string - 1 + 2; val it = 3 : int - ~1.5; val it = ~1.5 : real - [1, 1, 2, 3, 5]; val it = [1,1,2,3,5] : int list - fn i => i mod 2 = 1; val it = fn : int -> bool - (1, "a"); val it = (1,"a") : int * string - {name = "Fred", empno = 100}; val it = {empno=100,name="Fred"} : {empno:int, name:string}
  • 27. Standard ML: variables and functions - val x = 1; val x = 1 : int - val isOdd = fn i => i mod 2 = 1; val isOdd = fn : int -> bool - fun isOdd i = i mod 2 = 0; val isOdd = fn : int -> bool - isOdd x; val it = true : bool - let = val x = 6 = fun isOdd i = i mod 2 = 1 = in = isOdd x = end; val it = false : bool val assigns a value to a variable. fun declares a function. ● fun is syntactic sugar for val ... = fn ... => ... let allows you to make several declarations before evaluating an expression.
  • 28. - datatype 'a tree = = EMPTY = | LEAF of 'a = | NODE of ('a * 'a tree * 'a tree); Algebraic data types, case, and recursion datatype declares an algebraic data type. 'a Is a type variable and therefore 'a tree is a polymorphic type.
  • 29. - datatype 'a tree = = EMPTY = | LEAF of 'a = | NODE of ('a * 'a tree * 'a tree); - val t = = NODE (1, LEAF 2, NODE (3, EMPTY, LEAF 7)); EMPTY Algebraic data types, case, and recursion datatype declares an algebraic data type. 'a Is a type variable and therefore 'a tree is a polymorphic type. Define an instance of tree using its constructors NODE, LEAF and EMPTY. LEAF 2 NODE 1 NODE 3 LEAF 7
  • 30. - datatype 'a tree = = EMPTY = | LEAF of 'a = | NODE of ('a * 'a tree * 'a tree); - val t = = NODE (1, LEAF 2, NODE (3, EMPTY, LEAF 7)); - val rec sumTree = fn t => = case t of EMPTY => 0 = | LEAF i => i = | NODE (i, l, r) => = i + sumTree l + sumTree r; val sumTree = fn : int tree -> int - sumTree t; val it = 13 : int Algebraic data types, case, and recursion datatype declares an algebraic data type. 'a Is a type variable and therefore 'a tree is a polymorphic type. Define an instance of tree using its constructors NODE, LEAF and EMPTY. case matches patterns, deconstructing data types, and binding variables as it goes. sumTree use case and calls itself recursively. EMPTY LEAF 2 NODE 1 NODE 3 LEAF 7
  • 31. - datatype 'a tree = = EMPTY = | LEAF of 'a = | NODE of ('a * 'a tree * 'a tree); - val t = = NODE (1, LEAF 2, NODE (3, EMPTY, LEAF 7)); - val rec sumTree = fn t => = case t of EMPTY => 0 = | LEAF i => i = | NODE (i, l, r) => = i + sumTree l + sumTree r; val sumTree = fn : int tree -> int - sumTree t; val it = 13 : int - fun sumTree EMPTY = 0 = | sumTree (LEAF i) = i = | sumTree (NODE (i, l, r)) = = i + sumTree l + sumTree r; val sumTree = fn : int tree -> int Algebraic data types, case, and recursion datatype declares an algebraic data type. 'a Is a type variable and therefore 'a tree is a polymorphic type. Define an instance of tree using its constructors NODE, LEAF and EMPTY. case matches patterns, deconstructing data types, and binding variables as it goes. sumTree use case and calls itself recursively. EMPTY LEAF 2 NODE 1 NODE 3 LEAF 7 fun is a shorthand for case and val rec.
  • 32. Relations and higher-order functions Relations are lists of records. A higher-order function is a function whose arguments or return value are functions. Common higher-order functions that act on lists: ● List.filter – equivalent to the Filter relational operator (SQL WHERE) ● List.map – equivalent to Project relational operator (SQL SELECT) ● flatMap – as List.map, but may produce several output elements per input element #deptno is a system-generated function that accesses the deptno field of a record. - val emps = [ = {id = 100, name = "Fred", deptno = 10}, = {id = 101, name = "Velma", deptno = 20}, = {id = 102, name = "Shaggy", deptno = 30}, = {id = 103, name = "Scooby", deptno = 30}]; - List.filter (fn e => #deptno e = 30) emps; val it = [{deptno=30,id=102,name="Shaggy"}, {deptno=30,id=103,name="Scooby"}] : {deptno:int, id:int, name:string} list SELECT * FROM emps AS e WHERE e.deptno = 30 Equivalent SQL
  • 33. Implementing Join using higher-order functions List.map function is equivalent to Project relational operator (SQL SELECT) We also define a flatMap function. - val depts = [ = {deptno = 10, name = "Sales"}, = {deptno = 20, name = "Marketing"}, = {deptno = 30, name = "R&D"}]; - fun flatMap f xs = List.concat (List.map f xs); val flatMap = fn : ('a -> 'b list) -> 'a list -> 'b list - List.map = (fn (d, e) => {deptno = #deptno d, name = #name e}) = (List.filter = (fn (d, e) => #deptno d = #deptno e) = (flatMap = (fn e => (List.map (fn d => (d, e)) depts)) = emps)); val it = [{deptno=10,name="Fred"}, {deptno=20,name="Velma"}, {deptno=30,name="Shaggy"}, {deptno=30,name="Scooby"}] : {deptno:int, name:string} list SELECT d.deptno, e.name FROM emps AS e, depts AS d WHERE d.deptno = e.deptno Equivalent SQL
  • 34.
  • 35. 3. Functional + Relational Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog) Morel
  • 36. List.map (fn (d, e) => {deptno = #deptno d, name = #name e}) (List.filter (fn (d, e) => #deptno d = #deptno e) (flatMap (fn e => (List.map (fn d => (d, e)) depts)) emps)); Implementing Join in Morel using from Morel extensions to Standard ML: ● from operator creates a list comprehension ● x.field is shorthand for #field x ● {#field} is shorthand for {field = #field x} SELECT d.deptno, e.name FROM emps AS e, depts AS d WHERE d.deptno = e.deptno Equivalent SQL
  • 37. List.map (fn (d, e) => {deptno = #deptno d, name = #name e}) (List.filter (fn (d, e) => #deptno d = #deptno e) (flatMap (fn e => (List.map (fn d => (d, e)) depts)) emps)); from e in emps, d in depts where #deptno e = #deptno d yield {deptno = #deptno d, name = #name e}; Implementing Join in Morel using from Morel extensions to Standard ML: ● from operator creates a list comprehension ● x.field is shorthand for #field x ● {#field} is shorthand for {field = #field x} SELECT d.deptno, e.name FROM emps AS e, depts AS d WHERE d.deptno = e.deptno Equivalent SQL
  • 38. List.map (fn (d, e) => {deptno = #deptno d, name = #name e}) (List.filter (fn (d, e) => #deptno d = #deptno e) (flatMap (fn e => (List.map (fn d => (d, e)) depts)) emps)); from e in emps, d in depts where #deptno e = #deptno d yield {deptno = #deptno d, name = #name e}; from e in emps, d in depts where e.deptno = d.deptno yield {d.deptno, e.name}; Implementing Join in Morel using from Morel extensions to Standard ML: ● from operator creates a list comprehension ● x.field is shorthand for #field x ● {#field} is shorthand for {field = #field x} SELECT d.deptno, e.name FROM emps AS e, depts AS d WHERE d.deptno = e.deptno Equivalent SQL
  • 39. WordCount let fun split0 [] word words = word :: words | split0 (#" " :: s) word words = split0 s "" (word :: words) | split0 (c :: s) word words = split0 s (word ^ (String.str c)) words fun split = List.rev (split0 (String.explode s) "" []) in from line in lines, word in split line group word compute c = count end;
  • 40. WordCount - let = fun split0 [] word words = word :: words = | split0 (#" " :: s) word words = split0 s "" (word :: words) = | split0 (c :: s) word words = split0 s (word ^ (String.str c)) words = fun split = List.rev (split0 (String.explode s) "" []) = in = from line in lines, = word in split line = group word compute c = count = end; val wordCount = fn : string list -> {c:int, word:string} list - wordCount ["a skunk sat on a stump", = "and thunk the stump stunk", = "but the stump thunk the skunk stunk"]; val it = [{c=2,word="a"},{c=3,word="the"},{c=1,word="but"}, {c=1,word="sat"},{c=1,word="and"},{c=2,word="stunk"}, {c=3,word="stump"},{c=1,word="on"},{c=2,word="thunk"}, {c=2,word="skunk"}] : {c:int, word:string} list
  • 41. - let = fun split0 [] word words = word :: words = | split0 (#" " :: s) word words = split0 s "" (word :: words) = | split0 (c :: s) word words = split0 s (word ^ (String.str c)) words = fun split = List.rev (split0 (String.explode s) "" []) = in = from line in lines, = word in split line = group word compute c = count = end; val wordCount = fn : string list -> {c:int, word:string} list - wordCount ["a skunk sat on a stump", = "and thunk the stump stunk", = "but the stump thunk the skunk stunk"]; val it = [{c=2,word="a"},{c=3,word="the"},{c=1,word="but"}, {c=1,word="sat"},{c=1,word="and"},{c=2,word="stunk"}, {c=3,word="stump"},{c=1,word="on"},{c=2,word="thunk"}, {c=2,word="skunk"}] : {c:int, word:string} list WordCount Functional programming “in the small”
  • 42. WordCount - let = fun split0 [] word words = word :: words = | split0 (#" " :: s) word words = split0 s "" (word :: words) = | split0 (c :: s) word words = split0 s (word ^ (String.str c)) words = fun split = List.rev (split0 (String.explode s) "" []) = in = from line in lines, = word in split line = group word compute c = count = end; val wordCount = fn : string list -> {c:int, word:string} list - wordCount ["a skunk sat on a stump", = "and thunk the stump stunk", = "but the stump thunk the skunk stunk"]; val it = [{c=2,word="a"},{c=3,word="the"},{c=1,word="but"}, {c=1,word="sat"},{c=1,word="and"},{c=2,word="stunk"}, {c=3,word="stump"},{c=1,word="on"},{c=2,word="thunk"}, {c=2,word="skunk"}] : {c:int, word:string} list Functional programming “in the large”
  • 43. Functions as views, functions as values - fun emps2 () = = from e in emps = yield {e.id, = e.name, = e.deptno, = comp = fn revenue => case e.deptno of = 30 => e.id + revenue / 2 = | _ => e.id}; val emps2 = fn : unit -> {comp:int -> int, deptno:int, id:int, name:string} list
  • 44. Functions as views, functions as values emps2 is a function that returns a collection - fun emps2 () = = from e in emps = yield {e.id, = e.name, = e.deptno, = comp = fn revenue => case e.deptno of = 30 => e.id + revenue / 2 = | _ => e.id}; val emps2 = fn : unit -> {comp:int -> int, deptno:int, id:int, name:string} list
  • 45. - fun emps2 () = = from e in emps = yield {e.id, = e.name, = e.deptno, = comp = fn revenue => case e.deptno of = 30 => e.id + revenue / 2 = | _ => e.id}; val emps2 = fn : unit -> {comp:int -> int, deptno:int, id:int, name:string} list Functions as views, functions as values The comp field is a function value (in fact, it’s a closure) emps2 is a function that returns a collection
  • 46. Functions as views, functions as values - fun emps2 () = = from e in emps = yield {e.id, = e.name, = e.deptno, = comp = fn revenue => case e.deptno of = 30 => e.id + revenue / 2 = | _ => e.id}; val emps2 = fn : unit -> {comp:int -> int, deptno:int, id:int, name:string} list - fun emps2 () = = from e in emps = yield {e.id, = e.name, = e.deptno, = comp = fn revenue => case e.deptno of = 30 => e.id + revenue / 2 = | _ => e.id}; val emps2 = fn : unit -> {comp:int -> int, deptno:int, id:int, name:string} list - from e in emps2 () = yield {e.name, e.id, c = e.comp 1000}; The comp field is a function value (in fact, it’s a closure) emps2 is a function that returns a collection
  • 47. Functions as views, functions as values - fun emps2 () = = from e in emps = yield {e.id, = e.name, = e.deptno, = comp = fn revenue => case e.deptno of = 30 => e.id + revenue / 2 = | _ => e.id}; val emps2 = fn : unit -> {comp:int -> int, deptno:int, id:int, name:string} list - from e in emps2 () = yield {e.name, e.id, c = e.comp 1000}; val it = [{c=100,id=100,name="Fred"}, {c=101,id=101,name="Velma"}, {c=602,id=102,name="Shaggy"}, {c=603,id=103,name="Scooby"}] : {c:int, id:int, name:string} list
  • 48. Chaining relational operators - from e in emps = order e.deptno, e.id desc = yield {e.name, nameLength = String.size e.name, e.id, e.deptno} = where nameLength > 4 = group deptno compute c = count, s = sum of nameLength = where s > 10 = yield c + s; val it = [14] : int list
  • 49. Chaining relational operators - step 1 - from e in emps; val it = [{deptno=10,id=100,name="Fred"}, {deptno=20,id=101,name="Velma"}, {deptno=30,id=102,name="Shaggy"}, {deptno=30,id=103,name="Scooby"}] : {deptno:int, id:int, name:string} list
  • 50. Chaining relational operators - step 2 - from e in emps = order e.deptno, e.id desc; val it = [{deptno=10,id=100,name="Fred"}, {deptno=20,id=101,name="Velma"}, {deptno=30,id=103,name="Scooby"}, {deptno=30,id=102,name="Shaggy"}] : {deptno:int, id:int, name:string} list
  • 51. Chaining relational operators - step 3 - from e in emps = order e.deptno, e.id desc = yield {e.name, nameLength = String.size e.name, e.id, e.deptno}; val it = [{deptno=10,id=100,name="Fred",nameLength=4}, {deptno=20,id=101,name="Velma",nameLength=5}, {deptno=30,id=103,name="Scooby",nameLength=6}, {deptno=30,id=102,name="Shaggy",nameLength=6}] : {deptno:int, id:int, name:string, nameLength:int} list
  • 52. Chaining relational operators - step 4 - from e in emps = order e.deptno, e.id desc = yield {e.name, nameLength = String.size e.name, e.id, e.deptno} = where nameLength > 4; val it = [{deptno=20,id=101,name="Velma",nameLength=5}, {deptno=30,id=103,name="Scooby",nameLength=6}, {deptno=30,id=102,name="Shaggy",nameLength=6}] : {deptno:int, id:int, name:string, nameLength:int} list
  • 53. Chaining relational operators - step 5 - from e in emps = order e.deptno, e.id desc = yield {e.name, nameLength = String.size e.name, e.id, e.deptno} = where nameLength > 4 = group deptno compute c = count, s = sum of nameLength; val it = [{c=1,deptno=20,s=5}, {c=2,deptno=30,s=12}] : {c:int, deptno:int, s:int} list
  • 54. Chaining relational operators - step 6 - from e in emps = order e.deptno, e.id desc = yield {e.name, nameLength = String.size e.name, e.id, e.deptno} = where nameLength > 4 = group deptno compute c = count, s = sum of nameLength = where s > 10; val it = [{c=2,deptno=30,s=12}] : {c:int, deptno:int, s:int} list
  • 55. Chaining relational operators = from e in emps = order e.deptno, e.id desc = yield {e.name, nameLength = String.size e.name, e.id, e.deptno} = where nameLength > 4 = group deptno compute c = count, s = sum of nameLength = where s > 10 = yield c + s; val it = [14] : int list
  • 56. Chaining relational operators Morel from e in emps order e.deptno, e.id desc yield {e.name, nameLength = String.size e.name, e.id, e.deptno} where nameLength > 4 group deptno compute c = count, s = sum of nameLength where s > 10 yield c + s; SQL (almost equivalent) SELECT c + s FROM (SELECT deptno, COUNT(*) AS c, SUM(nameLength) AS s FROM (SELECT e.name, CHAR_LENGTH(e.ename) AS nameLength, e.id, e.deptno FROM (SELECT * FROM emps AS e ORDER BY e.deptno, e.id DESC)) WHERE nameLength > 4 GROUP BY deptno HAVING s > 10) Java (very approximately) for (Emp e : emps) { String name = e.name; int nameLength = name.length(); int id = e.id; int deptno = e.deptno; if (nameLength > 4) { ...
  • 57.
  • 58. 4. Functional + Data parallel Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog) Integrating Morel with Apache Calcite
  • 59. Apache Calcite Apache Calcite Server JDBC server JDBC client SQL parser & validator Query planner Adapter Pluggable rewrite rules Pluggable stats / cost Pluggable catalog Physical operators Storage Relational algebra Toolkit for writing a DBMS Many parts are optional or pluggable Relational algebra is the core
  • 60. SELECT d.name, COUNT(*) AS c FROM Emps AS e JOIN Depts AS d USING (deptno) WHERE e.age > 50 GROUP BY d.deptno HAVING COUNT(*) > 5 ORDER BY c DESC Relational algebra Based on set theory, plus operators: Project, Filter, Aggregate, Union, Join, Sort Calcite provides: ● SQL to relational algebra ● Query planner ● Physical operators to execute plan ● An adapter system to make external data sources look like tables Scan [Emps] Scan [Depts] Join [e.deptno = d.deptno] Filter [e.age > 50] Aggregate [deptno, COUNT(*) AS c] Filter [c > 5] Project [name, c] Sort [c DESC]
  • 61. Algebraic rewrite Scan [Emps] Scan [Depts] Join [e.deptno = d.deptno] Filter [e.age > 50] Aggregate [deptno, COUNT(*) AS c] Filter [c > 5] Project [name, c] Sort [c DESC] Calcite optimizes queries by applying rewrite rules that preserve semantics. Planner uses dynamic programming, seeking the lowest total cost. SELECT d.name, COUNT(*) AS c FROM (SELECT * FROM Emps WHERE e.age > 50) AS e JOIN Depts AS d USING (deptno) GROUP BY d.deptno HAVING COUNT(*) > 5 ORDER BY c DESC
  • 62. Integration with Apache Calcite - schemas Expose Calcite schemas as named records, each field of which is a table. A default Morel connection has variables foodmart and scott (connections to hsqldb via Calcite’s JDBC adapter). Connections to other Calcite adapters (Apache Cassandra, Druid, Kafka, Pig, Redis) are also possible. - foodmart; val it = {account=<relation>, currency=<relation>, customer=<relation>, ...} : ...; - scott; val it = {bonus=<relation>, dept=<relation>, emp=<relation>, salgrade=<relation>} : ...; - scott.dept; val it = [{deptno=10,dname="ACCOUNTING",loc="NEW YORK"}, {deptno=20,dname="RESEARCH",loc="DALLAS"}, {deptno=30,dname="SALES",loc="CHICAGO"}, {deptno=40,dname="OPERATIONS",loc="BOSTON"}] : {deptno:int, dname:string, loc:string} list - from d in scott.dept = where notExists (from e in scott.emp = where e.deptno = d.deptno = andalso e.job = "CLERK"); val it = [{deptno=40,dname="OPERATIONS",loc="BOSTON"}] : {deptno:int, dname:string, loc:string} list
  • 63. - Sys.set ("hybrid", true); val it = () : unit - from d in scott.dept = where notExists (from e in scott.emp = where e.deptno = d.deptno = andalso e.job = "CLERK"); val it = [{deptno=40,dname="OPERATIONS",loc="BOSTON"}] : {deptno:int, dname:string, loc:string} list Integration with Apache Calcite - relational algebra In “hybrid” mode, Morel’s compiler identifies sections of Morel programs that are relational operations and converts them to Calcite relational algebra. Calcite can them optimize these and execute them on their native systems.
  • 64. - Sys.set ("hybrid", true); val it = () : unit - from d in scott.dept = where notExists (from e in scott.emp = where e.deptno = d.deptno = andalso e.job = "CLERK"); val it = [{deptno=40,dname="OPERATIONS",loc="BOSTON"}] : {deptno:int, dname:string, loc:string} list - Sys.plan(); val it = "calcite(plan LogicalProject(deptno=[$0], dname=[$1], loc=[$2]) LogicalFilter(condition=[IS NULL($4)]) LogicalJoin(condition=[=($0, $3)], joinType=[left]) LogicalProject(deptno=[$0], dname=[$1], loc=[$2]) JdbcTableScan(table=[[scott, DEPT]]) LogicalProject(deptno=[$0], $f1=[true]) LogicalAggregate(group=[{0}]) LogicalProject(deptno=[$1]) LogicalFilter(condition=[AND(=($5, 'CLERK'), IS NOT NULL($1))]) LogicalProject(comm=[$6], deptno=[$7], empno=[$0],ename=[$1], hiredate=[$4], job=[$2] JdbcTableScan(table=[[scott, EMP]]))" : string Integration with Apache Calcite - relational algebra In “hybrid” mode, Morel’s compiler identifies sections of Morel programs that are relational operations and converts them to Calcite relational algebra. Calcite can them optimize these and execute them on their native systems.
  • 65. Optimization Relational query optimization ● Applies to relational operators (~10 core operators + UDFs) not general-purpose code ● Transformation rules that match patterns (e.g. Filter on Project) ● Decisions based on cost & statistics ● Hybrid plans - choose target engine, sort order ● Materialized views ● Macro-level optimizations Functional program optimization ● Applies to typed lambda calculus ● Inline constant lambda expressions ● Eliminate dead code ● Defend against mutually recursive groups ● Careful to not inline expressions that are used more than once (increasing code size) or which bind closures more often (increasing running time) ● Micro-level optimizations [GM] “The Volcano Optimizer Generator: Extensibility and Efficient Search” (Graefe, McKenna 1993) [PJM] “Secrets of the Glasgow Haskell Compiler inliner” (Peyton Jones, Marlow 2002)
  • 66. WordCount Data parallel Relational (Apache Calcite) Functional (Morel) Concisely specified in a functional programming language (Morel) Translated into relational algebra (Apache Calcite) Executed in the data-parallel framework of your choice from line in lines, word in split line group word compute c = count;
  • 67.
  • 68. 5. Functional + Deductive Data parallel (e.g. Flume, Spark) Relational (e.g. SQL) Functional (e.g. Haskell, Standard ML, Morel) Deductive (e.g. Datalog) An elegant way to recursively define relations in a functional programming language* *Designed but not yet implemented
  • 69. # SQL SELECT * FROM parents WHERE parent = ‘elrond’; parent child ====== ======= elrond arwen elrond elladan elrond elrohir # Morel from (parent, child) in parents where parent = “elrond”; [(“elrond”, “arwen”), (“elrond”, “elladan”), (“elrond”, “elrohir”)]; # Datalog is_parent(earendil, elrond). is_parent(elrond, arwen). is_parent(elrond, elladan). is_parent(elrond, elrohir). answer(X) :- is_parent(elrond, X). X = arwen X = elladan X = elrohir Parents (a base relation)
  • 70. # SQL CREATE VIEW ancestors AS WITH RECURSIVE a AS ( SELECT parent AS ancestor, child AS descendant FROM parents UNION ALL SELECT a.ancestor, p.child FROM parents AS p JOIN a ON a.descendant = p.parent) SELECT * FROM a; SELECT * FROM ancestors WHERE descendant = ‘arwen’; ancestor descendant ======== ======= earendil arwen elrond arwen # Morel fun ancestors () = (from (x, y) in parents) union (from (x, y) in parents, (y2, z) in ancestors () where y = y2 yield (x, z)); from (ancestor, descendant) in ancestors () where descendant = “arwen”; Uncaught exception: StackOverflowError # Datalog is_ancestor(X, Y) :- is_parent(X, Y). is_ancestor(X, Y) :- is_parent(X, Z), is_ancestor(Z, Y). answer(X) :- is_ancestor(X, arwen). Ancestors (a recursively-defined relation)
  • 71. # SQL CREATE VIEW ancestors AS WITH RECURSIVE a AS ( SELECT parent AS ancestor, child AS descendant FROM parents UNION ALL SELECT a.ancestor, p.child FROM parents AS p JOIN a ON a.descendant = p.parent) SELECT * FROM a; SELECT * FROM ancestors WHERE descendant = ‘arwen’; ancestor descendant ======== ======= earendil arwen elrond arwen # Morel fun ancestors () = (from (x, y) in parents) union (from (x, y) in parents, (y2, z) in ancestors () where y = y2 yield (x, z)); from (ancestor, descendant) in ancestors () where descendant = “arwen”; Uncaught exception: StackOverflowError # Datalog is_ancestor(X, Y) :- is_parent(X, Y). is_ancestor(X, Y) :- is_parent(X, Z), is_ancestor(Z, Y). answer(X) :- is_ancestor(X, arwen). Ancestors (a recursively-defined relation) “StackOverflowError” is not easy to fix. It’s hard to a write a recursive function that iterates until a set reaches a fixed point.
  • 72. # Morel “forwards” relation # Relation defined using algebra - fun clerks () = = from e in emps = where e.job = “CLERK”; # Query uses regular iteration - from e in clerks, = d in depts = where d.deptno = e.deptno = andalso d.loc = “DALLAS” = yield e.name; val it = [“SMITH”, “ADAMS”] : string list; # Morel “backwards” relation # Relation defined using a predicate - fun isClerk e = = e.job = “CLERK”; # Query uses a mixture of constrained # and regular iteration - from e suchThat isClerk e, = d in depts = where d.deptno = e.deptno = andalso d.loc = “DALLAS” = yield e.name; val it = [“SMITH”, “ADAMS”] : string list; Two ways to define a relation
  • 73. # Morel “forwards” relation # Relation defined using algebra - fun clerks () = = from e in emps = where e.job = “CLERK”; # Query uses regular iteration - from e in clerks, = d in depts = where d.deptno = e.deptno = andalso d.loc = “DALLAS” = yield e.name; val it = [“SMITH”, “ADAMS”] : string list; # Morel “backwards” relation # Relation defined using a predicate - fun isClerk e = = e.job = “CLERK”; # Query uses a mixture of constrained # and regular iteration - from e suchThat isClerk e, = d in depts = where d.deptno = e.deptno = andalso d.loc = “DALLAS” = yield e.name; val it = [“SMITH”, “ADAMS”] : string list; Two ways to define a relation “suchThat” keyword converts a bool function into a relation
  • 74. # Morel - fun isAncestor (x, z) = = (x, z) elem parents = orelse exists ( = from y suchThat isAncestor (x, y) = andalso (y, z) elem parents); - from a suchThat isAncestor (a, “arwen”); val it = [“earendil”, “elrond”] : string list - from d suchThat isAncestor (“earendil”, d); val it = [“elrond”, “arwen”, “elladan”, “elrohir”] : string list Recursively-defined predicate relation
  • 75.
  • 76. Morel Functional query language Rich type system, concise as SQL, Turing complete Combines (relational) query optimization with (lambda) FP optimization techniques Execute in Java interpreter and/or data-parallel engines from e in emps, d in depts where e.deptno = d.deptno yield {d.deptno, e.name}; from line in lines, word in split line group word compute c = count; from a suchThat isAncestor (a, “arwen”);
  • 77. Thank you! Questions? Julian Hyde, Looker/Google • @julianhyde https://github.com/hydromatic/morel • @morel_lang https://calcite.apache.org • @ApacheCalcite
  • 78.
  • 80. Other operators from in where yield order … desc group … compute count max min sum intersect union except exists notExists elem notElem only
  • 81. Compared to other languages Haskell – Haskell comprehensions are more general (monads vs lists). Morel is focused on relational algebra and probably benefits from a simpler type system. Builder APIs (e.g. LINQ, FlumeJava, Cascading, Apache Spark) – Builder APIs are two languages. E.g. Apache Spark programs are Scala that builds an algebraic expression. Scala code (especially lambdas) is sprinkled throughout the algebra. Scala compilation is not integrated with algebra optimization. SQL – SQL’s type system does not have parameterized types, so higher order functions are awkward. Tables and columns have separate namespaces, which complicates handling of nested collections. Functions and temporary variables cannot be defined inside queries, so queries are not Turing-complete (unless you use recursive query gymnastics).
  • 82. Standard ML: types Type Example Primitive types bool char int real string unit true: bool #"a": char ~1: int 3.14: real "foo": string (): unit Function types string -> int int * int -> int String.size fn (x, y) => x + y * y Tuple types int * string (10, "Fred") Record types {empno:int, name:string} {empno=10, name="Fred"} Collection types int list (bool * (int -> int)) list [1, 2, 3] [(true, fn i => i + 1)] Type variables 'a List.length: 'a list -> int
  • 83. Functional programming ↔ relational programming Functional programming in-the-small - fun squareList [] = [] = | squareList (x :: xs) = x * x :: squareList xs; val squareList = fn : int list -> int list - squareList [1, 2, 3]; val it = [1,4,9] : int list Functional programming in-the-large - fun squareList xs = List.map (fn x => x * x) xs; val squareList = fn : int list -> int list - squareList [1, 2, 3]; val it = [1,4,9] : int list Relational programming - fun squareList xs = = from x in xs = yield x * x; - squareList [1, 2, 3]; val it = [1,4,9] : int list
  • 84. wordCount again wordCount in-the-small - fun wordCount list = ...; val wordCount = fn : string list -> {count:int, word:string} list wordCount in-the-large using mapReduce - fun mapReduce mapper reducer list = ...; val mapReduce = fn : ('a -> ('b * 'c) list) -> ('b * 'c list -> 'd) -> 'a list -> ('b * 'd) list - fun wc_mapper line = = List.map (fn w => (w, 1)) (split line); val wc_mapper = fn : string -> (string * int) list - fun wc_reducer (key, values) = = List.foldl (fn (x, y) => x + y) 0 values; val wc_reducer = fn : 'a * int list -> int - fun wordCount list = mapReduce wc_mapper wc_reducer list; val wordCount = fn : string list -> {count:int, word:string} list Relational implementation of mapReduce - fun mapReduce mapper reducer list = = from e in list, = (k, v) in mapper e = group k compute c = (fn vs => reducer (k, vs)) of v;
  • 85. group …. compute - fun median reals = ...; val median = fn : real list -> real - from e in emps = group x = e.deptno mod 3, = e.job = compute c = count, = sum of e.sal, = m = median of e.sal + e.comm; val it = {c:int, job:string, m:real, sum:real, x.int} list
  • 87. LucidDB C++ Calcite evolution - origins as an SMP DB JDBC server JDBC client Physical operators Rewrite rules Catalog Storage & data structures SQL parser & validator Query planner Relational algebra Java
  • 88. Optiq Calcite evolution - pluggable components JDBC server JDBC client Physical operators Rewrite rules SQL parser & validator Query planner Relational algebra
  • 89. Optiq Calcite evolution - pluggable components JDBC server JDBC client SQL parser & validator Query planner Adapter Pluggable rewrite rules Pluggable stats / cost Pluggable catalog Physical operators Storage Relational algebra
  • 90. Apache Calcite Calcite evolution - separate JDBC stack Avatica JDBC server JDBC client Pluggable rewrite rules Pluggable stats / cost Pluggable catalog ODBC client Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra
  • 91. Apache Calcite Calcite evolution - federation via adapters Pluggable rewrite rules Pluggable stats / cost Pluggable catalog Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra SQL
  • 92. Calcite evolution - federation via adapters Apache Calcite JDBC adapter Pluggable rewrite rules Pluggable stats / cost Enumerable adapter MongoDB adapter File adapter (CSV, JSON, Http) Apache Kafka adapter Apache Spark adapter Pluggable catalog SQL SQL parser & validator Query planner Relational algebra
  • 93. Calcite evolution - federation via adapters Apache Calcite Pluggable rewrite rules Pluggable stats / cost Enumerable adapter Pluggable catalog SQL SQL parser & validator Query planner Relational algebra
  • 94. Calcite evolution - federation via adapters Apache Calcite JDBC adapter Pluggable rewrite rules Pluggable stats / cost Pluggable catalog SQL SQL parser & validator Query planner Relational algebra
  • 95. Apache Calcite Calcite evolution - SQL dialects Pluggable rewrite rules Pluggable parser, lexical, conformance, operators Pluggable SQL dialect SQL SQL SQL parser & validator Query planner Relational algebra JDBC adapter
  • 96. Apache Calcite Calcite evolution - other front-end languages SQL Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra
  • 97. Calcite evolution - other front-end languages Pig RelBuilder Adapter Physical operators Morel Storage Query planner Relational algebra Datalog SQL parser & validator SQL