SlideShare ist ein Scribd-Unternehmen logo
1 von 52
SociaLite: High-level Query Language
for Big Data Analysis
Jiwon Seo, *Jongsoo Park, Jaeho Shin, Stephen Guo, and Monica S. Lam
STANFORD MOBISOCIAL RESEARCH GROUP
* INTEL PARALLEL R ESEARCH LA B
Problems in existing platforms
 Too difficult (low-level primitives)
 Inefficient (not network bound)
 Too many (sub) frameworks
 Graph analysis
 Data mining (or machine learning)
 Relational query
Why Another Big Data Platform?
SociaLite is a high-level query language
 Easy & efficient
 Compiled to distributed code
 1,000x hadoop
 Hadoop compatible
 Pythonintegration
 Good for
 Graph analysis
 Data mining
 Relational queries
Introducing SociaLite
 Concepts in SociaLite
 Distributed Tables
 Rules
 Python Integration (Jython & CPython)
 Analysis Algorithms
 Shortest Paths, PageRank
 K-Means, Logistic Regression
 Evaluation
 Demo
Outline
 Primary data structure in SociaLite
 Column oriented storage
 <type>
 Primitive types
 Object types
 opts
 indexby, sortby, …
Distributed In-Memory Tables
Table (<type> cx, …, (<type> cy,… (<type> cz…))) opts.
Distributed In-Memory Tables
Foo(int x, int y).
1 9
1 10
2 5
Bar[int x](int y).
Foo(int x, (int y)).
9 7
1
2
9
1 2
3 4
9 7
2 8
Machine 1 Machine 2
Bar[int x:0..10](int y).
Machine 1 Machine 2
1 2
2 8
3 4
9 7
9 10
5
7
Table options
 indexby <column>
 sortby <column>
 multiset
Column options
 range
 (distributed) partition
Distributed In-Memory Tables
Foo(int x, int y) indexby x.
Foo(int x, int y) sortby x.
Foo(int x, int y) multiset.
Foo(int x:0..100, int y).
Foo[int x](int y).
Rules
Foo[a](c) :- Bar[a](b), Qux[b](c).
Rule head Rule body
Rules
Foo[a](c) :- Bar[a](b), Qux[b](c).
1 2
1 3
8 4
8 7
9 11
2 9
2 10
5 4
10
711
9
Bar QuxFoo
Rules
Foo[a](c) :- Bar[a](b), Qux[b](c).
1 2
1 3
8 4
8 7
9 11
2 9
2 10
5 4
10
711
9
1 9
1 10
Bar QuxFoo
Rules
Foo[a](c) :- Bar[a](b), Qux[b](c).
1 2
1 3
8 4
8 7
9 11
2 9
2 10
5 4
10
711
9
1 9
1 10
9 9
Bar QuxFoo
Distributed Join
Foo[a](c) :- Bar[a](b), Qux[b](c).
1 2Bar
2 9 Qux
1 9
Qux
Foo
Bar
Foo
Machine 1 Machine 2
join
1 9
Distributed Join
Foo[a](c) :- Qux[b](c), Bar[a](b).
1 2Bar
2 9 QuxQux
Foo
Bar
Foo
Machine 1 Machine 2
Parallel Evaluation
Foo[a](c) :- Bar[a](b), Qux[b](c).
Machine 1 Machine 2
Bara
Barb
Barc
Bard
Bara
Barb
Barc
Bard
Parallel Evaluation
Foo[a](c) :- Bar[a](b), Qux[b](c).
Foo[a](c) :- Bar1a[a](b), Qux[b](c).
Foo[a](c) :- Bar1b[a](b), Qux[b](c).
Foo[a](c) :- Bar1c[a](b), Qux[b](c).
Foo[a](c) :- Bar1d[a](b), Qux[b](c).
Foo[a](c) :- Bar2a[a](b), Qux[b](c).
Foo[a](c) :- Bar2b[a](b), Qux[b](c).
Foo[a](c) :- Bar2c[a](b), Qux[b](c).
Foo[a](c) :- Bar2d[a](b), Qux[b](c).
Machine 1
Machine 2
Aggregation
Foo[a]($min(c)) :- Bar[a](b), Qux[b](c).
The $min aggregate function is applied to tuples in Foo
having the same first column value.
 Built-in aggregate functions
 min, max, sum, avg, argmin
 User-defined functions
 in Java or Python
Aggregation
 Head table also appears in rule body
Foo(a,c) :- Foo(a,b), Bar(b,c).
 Semantics
– rule executed repeatedly until no changes to Foo
Recursive Rules
SociaLite: Datalog Extensions for Efficient Social Network Analysis, ICDE’13
Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis, VLDB’14
Recursive Rules
`Edge(int s, (int t, double len)) indexby s.
Path(int n, double dist) indexby n. `
`Path(t, $min(d)) :- t=$SRC, d=0;
:- Path(n, d1), Edge(n, t, d2), d=d1+d2.`
Shortest Path algorithm in recursion + aggregation
 SociaLite queries in Python code
 `Queries are quoted in backtick`
 Python  SociaLite
 Python functions, variables are accessible in
SociaLite queries
 SociaLite tables are readable from Python
Python Integration (Jython)
Python Integration
print “This is Python code!”
Python Integration
print “This is Python code!”
# now we use SociaLite queries below
`Foo[int i](String s).
Foo[i](s) :- i=42, s=“the answer”.`
Python Integration
print “This is Python code!”
# now we use SociaLite queries below
`Foo[int i](String s).
Foo[i](s) :- i=42, s=“the answer”.`
v=“Python variable”
`Foo[i](s) :- i=43, s=$v.`
Python Integration
print “This is Python code!”
# now we use SociaLite queries below
`Foo[int i](String s).
Foo[i](s) :- i=42, s=“the answer”.`
v=“Python variable”
`Foo[i](s) :- i=43, s=$v.`
@returns(str)
def func(): return “Python func”
`Foo[i](s) :- i=44, s=$func().`
Python Integration
print “This is Python code!”
# now we use SociaLite queries below
`Foo[int i](String s).
Foo[i](s) :- i=42, s=“the answer”.`
v=“Python variable”
`Foo[i](s) :- i=43, s=$v.`
@returns(str)
def func(): return “Python func”
`Foo[i](s) :- i=44, s=$func().`
for i, s in `Foo[i](s)`:
print i, s
 Graph algorithms
 Shortest Paths
 PageRank
 Data mining algorithms
 K-Means
 Logistic regression
Analysis Algorithms
 Shortest Path
Graph Algorithm
`Edge(int s, (int t, double len)) indexby s.
Path(int n, double dist) indexby n. `
`Path(t, $min(d)) :- t=$SRC, d=0;
:- Path(n, d1), Edge(n, t, d2), d=d1+d2.`
 PageRank
Graph Algorithm
`Rank(n, 0, r) :- Node(n), r=1.0/$N.`
for t in range(30):
`Rank(pi, $t+1, $sum(r)) :- Node(pi), r=0.15*1.0/$N;
:- Rank(pj, $t, r1), Edge(pj, pi),
EdgeCnt(pj, cnt), r=0.85*r1/cnt.`
 PageRank
Graph Algorithm
`Rank(n, 0, r) :- Node(n), r=1.0/$N.`
for t in range(30):
`Rank(pi, $t+1, $sum(r)) :- Node(pi), r=0.15*1.0/$N;
:- Rank(pj, $t, r1), Edge(pj, pi),
EdgeCnt(pj, cnt), r=0.85*r1/cnt.`
d=damping factor
(we used 0.85)
 K-Means
Data Mining Algorithm
for i in range(50):
`Center(cid, $avg(p)) :- Data(id, p), Cluster(id, $i, c),
cid=c.value.`
`Cluster(id, $i+1, $argmin(idx, d)) :-
Data(id, p), Center(idx, a),
d=$getDiff(p, a).`
 Logistic Regression
Data Mining Algorithm
for i in range(0, 100):
`Gradient($i, $sum(w)) :- Data(id, p), Weight($i, w1),
dot=$dot(w1, p), y=$sigmoid(dot),
w = $computeWeights(p, y).`
`Weight($i+1, w) :- Weight($i, w1),
Gradient($i, g), w=$vecSum (w1, g).`
 Single-thread performance
 Multi-thread performance (on 16-core machine)
 Distributed performance (up to 64 machines)
Evaluation
Single-thread
0
1
2
3
Shortest
Paths
PageRank Mutual
Neighbors
Connected
Components
Triangles Clustering
Coefficients
Optimized Java vs SociaLite
SociaLite is as fast as highly optimized Java,
or ~30% slower than optimized C++
Multi-thread
0
2
4
6
8
10
12
14
16
18
0
2
4
6
8
10
12
14
16
18
20
1 2 4 6 8 10 12 14 16
ParallelizationSpeedup
ExecutionTime(Min.)
Number of Cores
time speedup ideal speedup
0
2
4
6
8
10
12
14
16
18
0
10
20
30
40
50
60
70
1 2 4 6 8 10 12 14 16
ParallelizationSpeedup
ExecutionTime(Min.)
Number of Threads
0
2
4
6
8
10
12
14
16
18
0
20
40
60
80
100
120
1 2 4 6 8 10 12 14 16
ParallelizationSpeedup
ExecutionTime(Sec.)
Number of Threads
0
2
4
6
8
10
12
14
16
18
0
10
20
30
40
50
60
70
80
90
100
1 2 4 6 8 10 12 14 16
ParallelizationSpeedup
ExecutionTime(Sec.)
Number of Threads
0
2
4
6
8
10
12
14
16
18
0
50
100
150
200
250
1 2 4 6 8 10 12 14 16
ParallelizationSpeedup
ExecutionTime(Min.)
Number of Threads
0
2
4
6
8
10
12
14
16
18
0
2
4
6
8
10
12
1 2 4 6 8 10 12 14 16
ParallelizationSpeedup
ExecutionTime(Hours)
Number of Threads
PageRank Mutual Neighbors
Connected Components Triangle Clustering Coefficients
Shortest Paths
Distributed
0
1
10
100
1000
1 4 16 64
Exectime(sec.)
Native Combblas Graphlab Socialite Giraph
Breadth First Search
0.1
1
10
100
1 4 16 64
Timeperiter.(sec.)
PageRank
1
10
100
1000
10000
1 4 16 64
Timeperiter.(sec.)
0
1
10
100
1000
1 4 16 64
Exectime(sec.)
TriangleCollaborative Filtering
SociaLite is
 Distributed query language
 Easy and efficient
 Integration with Python
 Algorithms in SociaLite (graph, data mining)
 Competitive performance
Summary
jiwon @ stanford.edu
http://socialite.stanford.edu
Questions?
Two experimental front-end
IPython
Gephi
GitHub data analysis
SociaLite + Gephi
Project/developer network
 Edge if developer contributes to project
Demo
 Custom memory allocator (temporary table)
 Optimized serialization
 Direct ByteBuffer (network buffer)
 Multiple network channels among workers
System Optimizations
Inside Worker Node
Recv’er
Worker
worker
master
Sender
Network
Buffer Pool
System Overview standalone mode
Compiler
Python Integration (preprocessing)
Worker threadWorker threadWorker thread
Eval Task
Builder
System Overview distributed mode
Worker
Worker
Worker
Master
Distributed File System (HDFS)
 Table column can be
 Bloom filter
 Sketches
Approximaton
Bloom Filter
 Probabilistic set data structure
 Elements represented as bits
 Cannot enumerate elements
 Quickly (approximately) computes set membership
 can have false-positives, but not false-negatives
Approximaton
Analysis example
 Social Network (friendship)
 Each person’s friends-of-friends
 Count the # of people in startup
 Call it a Startup Score
Approximaton
A
Approximaton
Foaf(i, f) :- Friend(i, f).
Foaf(i, ff) :- Friend(i, f), Friend(f, ff).
StartupScore(i, $inc(1)) :- Foaf(i, ff), WorkAt(ff, “Startup”).
Approximaton
Foaf(i, f) :- Friend(i, f).
Foaf(i, ff) :- Friend(i, f), Friend(f, ff).
StartupScore(i, $inc(1)) :- Foaf(i, ff), WorkAt(ff, “Startup”).
(2nd column of Foaf table is represented with a Bloom filter)
Approximaton
Foaf(i, f) :- Friend(i, f).
Foaf(i, ff) :- Friend(i, f), Friend(f, ff).
StartupScore(i, $inc(1)) :- Foaf(i, ff), WorkAt(ff, “Startup”).
Exact Approximation Comparison
Exec time (min) 28.9 19.4 32.8% faster
Memory usage(GB) 26.0 3.0 11.5% usage
Accuracy(<10% error) 100.0% 92.5%
(2nd column of Foaf table is represented with a Bloom filter)
System Overview
Worker
Worker
Worker
Master
Distributed File System
query
compiled
query
compiled
query
System Overview
Worker
Worker
Worker
Master
Distributed File System
idle msg
idle msg
 Query compiler
 Parser
 Analyzer
 Code generator (Java source code)
 Bytecode compiler
 Task scheduler
 Worker threads
 Network IO threads
System Components
Master Node
Worker Node

Weitere ähnliche Inhalte

Was ist angesagt?

What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performancePiotr Przymus
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探台灣資料科學年會
 
Simple APIs and innovative documentation
Simple APIs and innovative documentationSimple APIs and innovative documentation
Simple APIs and innovative documentationPyDataParis
 
Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...Michael Stumpf
 
Pig: Data Analysis Tool in Cloud
Pig: Data Analysis Tool in Cloud Pig: Data Analysis Tool in Cloud
Pig: Data Analysis Tool in Cloud Jianfeng Zhang
 
TensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow HubTensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow HubJeongkyu Shin
 
5 efficient-matching.ppt
5 efficient-matching.ppt5 efficient-matching.ppt
5 efficient-matching.pptmustafa sarac
 
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAlex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAI Frontiers
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersBayu Aldi Yansyah
 
Memory efficient pytorch
Memory efficient pytorchMemory efficient pytorch
Memory efficient pytorchHyungjoo Cho
 
Structured Interactive Scores with formal semantics
Structured Interactive Scores with formal semanticsStructured Interactive Scores with formal semantics
Structured Interactive Scores with formal semanticsMauricio Toro-Bermudez, PhD
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R台灣資料科學年會
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to pythonActiveState
 
Intro to Python (High School) Unit #3
Intro to Python (High School) Unit #3Intro to Python (High School) Unit #3
Intro to Python (High School) Unit #3Jay Coskey
 

Was ist angesagt? (20)

What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Python for R users
Python for R usersPython for R users
Python for R users
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
 
Simple APIs and innovative documentation
Simple APIs and innovative documentationSimple APIs and innovative documentation
Simple APIs and innovative documentation
 
Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...
 
Pig: Data Analysis Tool in Cloud
Pig: Data Analysis Tool in Cloud Pig: Data Analysis Tool in Cloud
Pig: Data Analysis Tool in Cloud
 
TensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow HubTensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow Hub
 
5 efficient-matching.ppt
5 efficient-matching.ppt5 efficient-matching.ppt
5 efficient-matching.ppt
 
Cs gate-2011
Cs gate-2011Cs gate-2011
Cs gate-2011
 
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAlex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
BEST gr-bertool
BEST gr-bertoolBEST gr-bertool
BEST gr-bertool
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning Practitioners
 
Memory efficient pytorch
Memory efficient pytorchMemory efficient pytorch
Memory efficient pytorch
 
Structured Interactive Scores with formal semantics
Structured Interactive Scores with formal semanticsStructured Interactive Scores with formal semantics
Structured Interactive Scores with formal semantics
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
Math synonyms
Math synonymsMath synonyms
Math synonyms
 
Intro to Python (High School) Unit #3
Intro to Python (High School) Unit #3Intro to Python (High School) Unit #3
Intro to Python (High School) Unit #3
 

Andere mochten auch

AvocadoDB query language (DRAFT!)
AvocadoDB query language (DRAFT!)AvocadoDB query language (DRAFT!)
AvocadoDB query language (DRAFT!)avocadodb
 
Api specification based function search engine using natural language query-S...
Api specification based function search engine using natural language query-S...Api specification based function search engine using natural language query-S...
Api specification based function search engine using natural language query-S...Sanif Sanif
 
Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Shubham Gupta
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphMongoDB
 
Airline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining ProjectAirline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining ProjectHaozhe Wang
 
Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Mingxuan Li
 

Andere mochten auch (9)

AvocadoDB query language (DRAFT!)
AvocadoDB query language (DRAFT!)AvocadoDB query language (DRAFT!)
AvocadoDB query language (DRAFT!)
 
Api specification based function search engine using natural language query-S...
Api specification based function search engine using natural language query-S...Api specification based function search engine using natural language query-S...
Api specification based function search engine using natural language query-S...
 
Phase1review
Phase1reviewPhase1review
Phase1review
 
Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
 
Airline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining ProjectAirline flights delay prediction- 2014 Spring Data Mining Project
Airline flights delay prediction- 2014 Spring Data Mining Project
 
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPTBIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
BIG DATA TO AVOID WEATHER RELATED FLIGHT DELAYS PPT
 
Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Ähnlich wie SociaLite: High-level Query Language for Big Data Analysis

Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyTravis Oliphant
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow규영 허
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Naoki (Neo) SATO
 
Reducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology MappingReducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology Mappingsatrajit
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniquesmark_landry
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit fasterPatrick Bos
 
SQLGitHub - Access GitHub API with SQL-like syntaxes
SQLGitHub - Access GitHub API with SQL-like syntaxesSQLGitHub - Access GitHub API with SQL-like syntaxes
SQLGitHub - Access GitHub API with SQL-like syntaxesJasmine Chen
 
Introduction to datastructure and algorithm
Introduction to datastructure and algorithmIntroduction to datastructure and algorithm
Introduction to datastructure and algorithmPratik Mota
 
Gate Previous Years Papers
Gate Previous Years PapersGate Previous Years Papers
Gate Previous Years PapersRahul Jain
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsTravis Oliphant
 

Ähnlich wie SociaLite: High-level Query Language for Big Data Analysis (20)

Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPy
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdf
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
AD3251-Data Structures Design-Notes-Searching-Hashing.pdf
AD3251-Data Structures  Design-Notes-Searching-Hashing.pdfAD3251-Data Structures  Design-Notes-Searching-Hashing.pdf
AD3251-Data Structures Design-Notes-Searching-Hashing.pdf
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
 
Reducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology MappingReducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology Mapping
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniques
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit faster
 
SQLGitHub - Access GitHub API with SQL-like syntaxes
SQLGitHub - Access GitHub API with SQL-like syntaxesSQLGitHub - Access GitHub API with SQL-like syntaxes
SQLGitHub - Access GitHub API with SQL-like syntaxes
 
Introduction to datastructure and algorithm
Introduction to datastructure and algorithmIntroduction to datastructure and algorithm
Introduction to datastructure and algorithm
 
N flavors of streaming
N flavors of streamingN flavors of streaming
N flavors of streaming
 
Gate Previous Years Papers
Gate Previous Years PapersGate Previous Years Papers
Gate Previous Years Papers
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Kürzlich hochgeladen (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

SociaLite: High-level Query Language for Big Data Analysis

  • 1. SociaLite: High-level Query Language for Big Data Analysis Jiwon Seo, *Jongsoo Park, Jaeho Shin, Stephen Guo, and Monica S. Lam STANFORD MOBISOCIAL RESEARCH GROUP * INTEL PARALLEL R ESEARCH LA B
  • 2. Problems in existing platforms  Too difficult (low-level primitives)  Inefficient (not network bound)  Too many (sub) frameworks  Graph analysis  Data mining (or machine learning)  Relational query Why Another Big Data Platform?
  • 3. SociaLite is a high-level query language  Easy & efficient  Compiled to distributed code  1,000x hadoop  Hadoop compatible  Pythonintegration  Good for  Graph analysis  Data mining  Relational queries Introducing SociaLite
  • 4.  Concepts in SociaLite  Distributed Tables  Rules  Python Integration (Jython & CPython)  Analysis Algorithms  Shortest Paths, PageRank  K-Means, Logistic Regression  Evaluation  Demo Outline
  • 5.  Primary data structure in SociaLite  Column oriented storage  <type>  Primitive types  Object types  opts  indexby, sortby, … Distributed In-Memory Tables Table (<type> cx, …, (<type> cy,… (<type> cz…))) opts.
  • 6. Distributed In-Memory Tables Foo(int x, int y). 1 9 1 10 2 5 Bar[int x](int y). Foo(int x, (int y)). 9 7 1 2 9 1 2 3 4 9 7 2 8 Machine 1 Machine 2 Bar[int x:0..10](int y). Machine 1 Machine 2 1 2 2 8 3 4 9 7 9 10 5 7
  • 7. Table options  indexby <column>  sortby <column>  multiset Column options  range  (distributed) partition Distributed In-Memory Tables Foo(int x, int y) indexby x. Foo(int x, int y) sortby x. Foo(int x, int y) multiset. Foo(int x:0..100, int y). Foo[int x](int y).
  • 8. Rules Foo[a](c) :- Bar[a](b), Qux[b](c). Rule head Rule body
  • 9. Rules Foo[a](c) :- Bar[a](b), Qux[b](c). 1 2 1 3 8 4 8 7 9 11 2 9 2 10 5 4 10 711 9 Bar QuxFoo
  • 10. Rules Foo[a](c) :- Bar[a](b), Qux[b](c). 1 2 1 3 8 4 8 7 9 11 2 9 2 10 5 4 10 711 9 1 9 1 10 Bar QuxFoo
  • 11. Rules Foo[a](c) :- Bar[a](b), Qux[b](c). 1 2 1 3 8 4 8 7 9 11 2 9 2 10 5 4 10 711 9 1 9 1 10 9 9 Bar QuxFoo
  • 12. Distributed Join Foo[a](c) :- Bar[a](b), Qux[b](c). 1 2Bar 2 9 Qux 1 9 Qux Foo Bar Foo Machine 1 Machine 2 join 1 9
  • 13. Distributed Join Foo[a](c) :- Qux[b](c), Bar[a](b). 1 2Bar 2 9 QuxQux Foo Bar Foo Machine 1 Machine 2
  • 14. Parallel Evaluation Foo[a](c) :- Bar[a](b), Qux[b](c). Machine 1 Machine 2 Bara Barb Barc Bard Bara Barb Barc Bard
  • 15. Parallel Evaluation Foo[a](c) :- Bar[a](b), Qux[b](c). Foo[a](c) :- Bar1a[a](b), Qux[b](c). Foo[a](c) :- Bar1b[a](b), Qux[b](c). Foo[a](c) :- Bar1c[a](b), Qux[b](c). Foo[a](c) :- Bar1d[a](b), Qux[b](c). Foo[a](c) :- Bar2a[a](b), Qux[b](c). Foo[a](c) :- Bar2b[a](b), Qux[b](c). Foo[a](c) :- Bar2c[a](b), Qux[b](c). Foo[a](c) :- Bar2d[a](b), Qux[b](c). Machine 1 Machine 2
  • 16. Aggregation Foo[a]($min(c)) :- Bar[a](b), Qux[b](c). The $min aggregate function is applied to tuples in Foo having the same first column value.
  • 17.  Built-in aggregate functions  min, max, sum, avg, argmin  User-defined functions  in Java or Python Aggregation
  • 18.  Head table also appears in rule body Foo(a,c) :- Foo(a,b), Bar(b,c).  Semantics – rule executed repeatedly until no changes to Foo Recursive Rules
  • 19. SociaLite: Datalog Extensions for Efficient Social Network Analysis, ICDE’13 Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis, VLDB’14 Recursive Rules `Edge(int s, (int t, double len)) indexby s. Path(int n, double dist) indexby n. ` `Path(t, $min(d)) :- t=$SRC, d=0; :- Path(n, d1), Edge(n, t, d2), d=d1+d2.` Shortest Path algorithm in recursion + aggregation
  • 20.  SociaLite queries in Python code  `Queries are quoted in backtick`  Python  SociaLite  Python functions, variables are accessible in SociaLite queries  SociaLite tables are readable from Python Python Integration (Jython)
  • 21. Python Integration print “This is Python code!”
  • 22. Python Integration print “This is Python code!” # now we use SociaLite queries below `Foo[int i](String s). Foo[i](s) :- i=42, s=“the answer”.`
  • 23. Python Integration print “This is Python code!” # now we use SociaLite queries below `Foo[int i](String s). Foo[i](s) :- i=42, s=“the answer”.` v=“Python variable” `Foo[i](s) :- i=43, s=$v.`
  • 24. Python Integration print “This is Python code!” # now we use SociaLite queries below `Foo[int i](String s). Foo[i](s) :- i=42, s=“the answer”.` v=“Python variable” `Foo[i](s) :- i=43, s=$v.` @returns(str) def func(): return “Python func” `Foo[i](s) :- i=44, s=$func().`
  • 25. Python Integration print “This is Python code!” # now we use SociaLite queries below `Foo[int i](String s). Foo[i](s) :- i=42, s=“the answer”.` v=“Python variable” `Foo[i](s) :- i=43, s=$v.` @returns(str) def func(): return “Python func” `Foo[i](s) :- i=44, s=$func().` for i, s in `Foo[i](s)`: print i, s
  • 26.  Graph algorithms  Shortest Paths  PageRank  Data mining algorithms  K-Means  Logistic regression Analysis Algorithms
  • 27.  Shortest Path Graph Algorithm `Edge(int s, (int t, double len)) indexby s. Path(int n, double dist) indexby n. ` `Path(t, $min(d)) :- t=$SRC, d=0; :- Path(n, d1), Edge(n, t, d2), d=d1+d2.`
  • 28.  PageRank Graph Algorithm `Rank(n, 0, r) :- Node(n), r=1.0/$N.` for t in range(30): `Rank(pi, $t+1, $sum(r)) :- Node(pi), r=0.15*1.0/$N; :- Rank(pj, $t, r1), Edge(pj, pi), EdgeCnt(pj, cnt), r=0.85*r1/cnt.`
  • 29.  PageRank Graph Algorithm `Rank(n, 0, r) :- Node(n), r=1.0/$N.` for t in range(30): `Rank(pi, $t+1, $sum(r)) :- Node(pi), r=0.15*1.0/$N; :- Rank(pj, $t, r1), Edge(pj, pi), EdgeCnt(pj, cnt), r=0.85*r1/cnt.` d=damping factor (we used 0.85)
  • 30.  K-Means Data Mining Algorithm for i in range(50): `Center(cid, $avg(p)) :- Data(id, p), Cluster(id, $i, c), cid=c.value.` `Cluster(id, $i+1, $argmin(idx, d)) :- Data(id, p), Center(idx, a), d=$getDiff(p, a).`
  • 31.  Logistic Regression Data Mining Algorithm for i in range(0, 100): `Gradient($i, $sum(w)) :- Data(id, p), Weight($i, w1), dot=$dot(w1, p), y=$sigmoid(dot), w = $computeWeights(p, y).` `Weight($i+1, w) :- Weight($i, w1), Gradient($i, g), w=$vecSum (w1, g).`
  • 32.  Single-thread performance  Multi-thread performance (on 16-core machine)  Distributed performance (up to 64 machines) Evaluation
  • 33. Single-thread 0 1 2 3 Shortest Paths PageRank Mutual Neighbors Connected Components Triangles Clustering Coefficients Optimized Java vs SociaLite SociaLite is as fast as highly optimized Java, or ~30% slower than optimized C++
  • 34. Multi-thread 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 20 1 2 4 6 8 10 12 14 16 ParallelizationSpeedup ExecutionTime(Min.) Number of Cores time speedup ideal speedup 0 2 4 6 8 10 12 14 16 18 0 10 20 30 40 50 60 70 1 2 4 6 8 10 12 14 16 ParallelizationSpeedup ExecutionTime(Min.) Number of Threads 0 2 4 6 8 10 12 14 16 18 0 20 40 60 80 100 120 1 2 4 6 8 10 12 14 16 ParallelizationSpeedup ExecutionTime(Sec.) Number of Threads 0 2 4 6 8 10 12 14 16 18 0 10 20 30 40 50 60 70 80 90 100 1 2 4 6 8 10 12 14 16 ParallelizationSpeedup ExecutionTime(Sec.) Number of Threads 0 2 4 6 8 10 12 14 16 18 0 50 100 150 200 250 1 2 4 6 8 10 12 14 16 ParallelizationSpeedup ExecutionTime(Min.) Number of Threads 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 1 2 4 6 8 10 12 14 16 ParallelizationSpeedup ExecutionTime(Hours) Number of Threads PageRank Mutual Neighbors Connected Components Triangle Clustering Coefficients Shortest Paths
  • 35. Distributed 0 1 10 100 1000 1 4 16 64 Exectime(sec.) Native Combblas Graphlab Socialite Giraph Breadth First Search 0.1 1 10 100 1 4 16 64 Timeperiter.(sec.) PageRank 1 10 100 1000 10000 1 4 16 64 Timeperiter.(sec.) 0 1 10 100 1000 1 4 16 64 Exectime(sec.) TriangleCollaborative Filtering
  • 36. SociaLite is  Distributed query language  Easy and efficient  Integration with Python  Algorithms in SociaLite (graph, data mining)  Competitive performance Summary
  • 38. Two experimental front-end IPython Gephi GitHub data analysis SociaLite + Gephi Project/developer network  Edge if developer contributes to project Demo
  • 39.
  • 40.  Custom memory allocator (temporary table)  Optimized serialization  Direct ByteBuffer (network buffer)  Multiple network channels among workers System Optimizations
  • 42. System Overview standalone mode Compiler Python Integration (preprocessing) Worker threadWorker threadWorker thread Eval Task Builder
  • 43. System Overview distributed mode Worker Worker Worker Master Distributed File System (HDFS)
  • 44.  Table column can be  Bloom filter  Sketches Approximaton
  • 45. Bloom Filter  Probabilistic set data structure  Elements represented as bits  Cannot enumerate elements  Quickly (approximately) computes set membership  can have false-positives, but not false-negatives Approximaton
  • 46. Analysis example  Social Network (friendship)  Each person’s friends-of-friends  Count the # of people in startup  Call it a Startup Score Approximaton A
  • 47. Approximaton Foaf(i, f) :- Friend(i, f). Foaf(i, ff) :- Friend(i, f), Friend(f, ff). StartupScore(i, $inc(1)) :- Foaf(i, ff), WorkAt(ff, “Startup”).
  • 48. Approximaton Foaf(i, f) :- Friend(i, f). Foaf(i, ff) :- Friend(i, f), Friend(f, ff). StartupScore(i, $inc(1)) :- Foaf(i, ff), WorkAt(ff, “Startup”). (2nd column of Foaf table is represented with a Bloom filter)
  • 49. Approximaton Foaf(i, f) :- Friend(i, f). Foaf(i, ff) :- Friend(i, f), Friend(f, ff). StartupScore(i, $inc(1)) :- Foaf(i, ff), WorkAt(ff, “Startup”). Exact Approximation Comparison Exec time (min) 28.9 19.4 32.8% faster Memory usage(GB) 26.0 3.0 11.5% usage Accuracy(<10% error) 100.0% 92.5% (2nd column of Foaf table is represented with a Bloom filter)
  • 50. System Overview Worker Worker Worker Master Distributed File System query compiled query compiled query
  • 52.  Query compiler  Parser  Analyzer  Code generator (Java source code)  Bytecode compiler  Task scheduler  Worker threads  Network IO threads System Components Master Node Worker Node