SlideShare ist ein Scribd-Unternehmen logo
1 von 45
on Wide-Column Stores
M. Sc. Johannes Schildgen
2015-06-29
schildgen@cs.uni-kl.de
Incremental Data Transformations
with
"A DBA walks into a
NoSQL bar, but turns
and leaves because he
couldn't find a table"
Column Families
RowId info children
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
Peter, 1965,
IBM, 70k
Lisa, 1997,
BSIT
Column Families
€10 €5
€0€7
HBase API
put ‘pers‘, ‘Carl‘,
‘info:born‘, ‘1982‘
put ‘pers‘, ‘Carl‘,
‘info:school‘, ‘BSIT‘
put ‘pers‘, ‘Carl‘,
‘info:school‘, ‘BUIT‘
get ‘pers‘, ‘Carl‘
Jaspersoft HBase QL
{
"tableName": "pers",
"deserializerClass": "com.jaspersoft…DefaultDeserializer",
"filter": {
"SingleColumnValueFilter": {
"family": „info",
"qualifier": „school",
"compareOp": "EQUAL",
"comparator": { "SubstringComparator": { "substr":
„BSIT" } }
}
}
}
𝛔 𝐬𝐜𝐡𝐨𝐨𝐥=′ 𝐁𝐒𝐈𝐓′ 𝐩𝐞𝐫𝐬
http://community.jaspersoft.com/wiki/jaspersoft-hbase-query-language
Phoenix
SELECT * FROM pers WHERE school = ‘BSIT‘
𝛔 𝐬𝐜𝐡𝐨𝐨𝐥=′ 𝐁𝐒𝐈𝐓′ 𝐩𝐞𝐫𝐬
https://github.com/forcedotcom/phoenix
„Parent of
each person?“
Input Table Output Table
Column
RowID Value
Input Cell Output Cell
Column
RowID Value
_c
_r _v
_c
_r _v
Input Cell Output Cell
_c
born
_r
Lisa
_v
1997
_c
born
_r
Lisa
_v
1997
Input Cell Output Cell
𝐩𝐞𝐫𝐬
OUT._r <- IN._r,
OUT.born <- IN.born;
𝝅 𝒃𝒐𝒓𝒏
_c
born
_r
Lisa
_v
1997
_c
born
_r
Lisa
_v
1997
Input Cell Output Cell
𝐩𝐞𝐫𝐬
OUT._r <- IN._r,
OUT.born <- IN.born,
OUT.school <- IN.school;
𝝅 𝒃𝒐𝒓𝒏,𝒔𝒄𝒉𝒐𝒐𝒍
_c
born
_r
Lisa
_v
1997
_c
born
_r
Lisa
_v
1997
Input Cell Output Cell
𝐩𝐞𝐫𝐬
OUT._r <- IN._r,
OUT.$(IN._c) <- IN._v;
𝛔 𝐬𝐜𝐡𝐨𝐨𝐥=′ 𝐁𝐒𝐈𝐓′
_c
born
_r
Lisa
_v
1997
_c
born
_r
Lisa
_v
1997
Input Cell Output Cell
IN-FILTER: school=‘BSIT‘,
OUT._r <- IN._r,
OUT.$(IN._c) <- IN._v;
row predicate
𝐩𝐞𝐫𝐬𝛔 𝐬𝐜𝐡𝐨𝐨𝐥=′ 𝐁𝐒𝐈𝐓′
That was:
Selection and Projection
Now:
Grouping
_c
cmpny
_r
Peter
_v
IBM
_c
salsum
_r
IBM
_v
645k
Input Cell Output Cell
Salary sum of
each company.
OUT._r <- IN.cmpny,
OUT.salsum <- SUM(IN.salary);
RowId info
Eve
Carl
Julia
Lisa
OUT._r <- IN.cmpny,
OUT.salsum <- SUM(IN.salary):
born cmpny salary
1965 IBM 70k
born cmpny job
1966 IBM intern
born cmpny salary
1967 IBM 80k
born school salary
1997 BSIT 1k
salsum
IBM 70k
salsum
IBM 80k
salsum
IBM 150k
Advanced Transformations:
More Filters
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
Peter, 1965,
IBM, 70k
Lisa, 1997,
BSIT
€10 €5
€0€7
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
OUT._r <- IN._r,
OUT.$(IN._c) <- IN._v;
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0
OUT._r <- IN._r,
OUT.$(IN._c) <- IN._v;
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0
OUT._r <- IN._r,
OUT.$(IN.children._c) <- IN._v;
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0
OUT._r <- IN._r,
OUT.$(IN.children._c?(@>5))
<- IN._v;
cell predicate
RowId info children
Peter
Lisa
born cmpny salary
1965 IBM 70k
Lisa Carl Susi Toni
€5 €0 €10 €7
born school
1997 BSIT
IN-FILTER: COL_COUNT(children)>0
OUT._r <- IN._r,
OUT.$(IN.children._c?(!Carl))
<- IN._v;
cell predicate
NotaQL Transformation Platform:
MapReduce
map(rowId, row)
row violates
row pred.?
has more
columns?
no
cell violates
cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns
and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
RowId info
Peter born cmpny salary
1965 IBM 70k
salsum
IBM 70k
((IBM, salsum), 70k)
reduce((r,c), {v})
put(r, c, aggregateAll(v))
Stop
((IBM, salsum), {70k, 80k, 10k})
((IBM, salsum), 160k)
Incremental Transformations:
Self-Maintainability
Salary sum of all people born
before 1980 per company.
IN-FILTER: born<1980,
OUT._r <- IN.cmpny,
OUT.salsum <- SUM(IN.salary);
17:45 17:47 17:50
Execute job New people
are added
Execute job
again
Δ+ =
Salary sum of all people born
before 1980 per company.
IN-FILTER: born<1980,
OUT._r <- IN.cmpny,
OUT.salsum <- SUM(IN.salary);
17:45 17:47 17:50
Execute job New people
are added
Execute job
again
salsum
IBM 150k
born cmpny salary
Melissa 1989 IBM 50k
Nora 1977 IBM 80k
salsum
IBM 230k+
Salary sum of all people born
before 1980 per company.
IN-FILTER: born<1980,
OUT._r <- IN.cmpny,
OUT.salsum <- SUM(IN.salary);
17:50 17:52 17:55
Execute job People
are deleted
Execute job
again
salsum
IBM 230k
born cmpny salary
Melissa 1989 IBM 50k
Nora 1977 IBM 80k
salsum
IBM 150k
-
Salary sum of all people born
before 1980 per company.
IN-FILTER: born<1980,
OUT._r <- IN.cmpny,
OUT.salsum <- SUM(IN.salary);
17:50 17:52 17:55
Execute job People
are updated
Execute job
again
born cmpny salary
Peter 1965 IBM 70k
1990
-= 70k
Salary sum of all people born
before 1980 per company.
IN-FILTER: born<1980,
OUT._r <- IN.cmpny,
OUT.salsum <- SUM(IN.salary);
17:50 17:52 17:55
Execute job People
are updated
Execute job
again
born cmpny salary
Peter 1965 IBM 70k
1990
+= 70k
1965
Salary sum of all people born
before 1980 per company.
IN-FILTER: born<1980,
OUT._r <- IN.cmpny,
OUT.salsum <- SUM(IN.salary);
17:50 17:52 17:55
Execute job People
are updated
Execute job
again
born cmpny salary
Peter 1965 IBM 70k
75k
+= 75k-70k
Salary sum of all people born
before 1980 per company.
IN-FILTER: born<1980,
OUT._r <- IN.cmpny,
OUT.salsum <- SUM(IN.salary);
17:50 17:52 17:55
Execute job People
are updated
Execute job
again
born cmpny salary
Peter 1965 IBM 70k
SAP
-= 70k
map(rowId, row)
row violates
row pred.?
has more
columns?
no
cell violates
cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns
and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
reduce((r,c), {v})
put(r, c, aggregateAll(v))
Stop
Just read Delta
(ts>ts‘)
map(rowId, row)
row violates
row pred.?
has more
columns?
no
cell violates
cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns
and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
reduce((r,c), {v})
put(r, c, aggregateAll(v))
Stop
…and the
former Result
map(rowId, row)
row violates
row pred.?
has more
columns?
no
cell violates
cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns
and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
reduce((r,c), {v})
put(r, c, aggregateAll(v))
Stop
Was it satisfied
on previous
execution?
Yes? Invert v
*
map(rowId, row)
row violates
row pred.?
has more
columns?
no
cell violates
cell pred.?
yes
map IN.{_r,_c,_v}, fetched columns
and constants to r,c and v
no
emit((r, c), v)
no
yes
Stop
yes
reduce((r,c), {v})
put(r, c, aggregateAll(v))
Stop
Unchanged?
Evaluation:
Performance
0
10
20
30
40
50
60
70
80
90
100
TPCH 1 (selection, projection, sum
of prices per status code)
TPCH 6 (selection, projection, sum
of all prices)
Reverse Web-Link Graph
Native Hadoop
NotaQL (non-inc.)
NotaQL (0.1%
changes, timestamp-
based CDC)
NotaQL (10%
changes, manual CDC)
Conclusion
• Selection, Projection
• Grouping, Aggregation
• Schema-Flexible
• Horizontal Aggregation
• Metadata↔Data
• Graph Processing
• Text Processing
≠SQL
Incremental!
Thank you!

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHSneha Padhiar
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectGayathriM270621
 

Kürzlich hochgeladen (20)

Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
STATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subjectSTATE TRANSITION DIAGRAM in psoc subject
STATE TRANSITION DIAGRAM in psoc subject
 

Empfohlen

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Empfohlen (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Incremental Data Transformations on Wide-Column Stores with NotaQL

  • 1. on Wide-Column Stores M. Sc. Johannes Schildgen 2015-06-29 schildgen@cs.uni-kl.de Incremental Data Transformations with
  • 2. "A DBA walks into a NoSQL bar, but turns and leaves because he couldn't find a table"
  • 4. RowId info children Peter Lisa born cmpny salary 1965 IBM 70k Lisa Carl Susi Toni €5 €0 €10 €7 born school 1997 BSIT Peter, 1965, IBM, 70k Lisa, 1997, BSIT Column Families €10 €5 €0€7
  • 5. HBase API put ‘pers‘, ‘Carl‘, ‘info:born‘, ‘1982‘ put ‘pers‘, ‘Carl‘, ‘info:school‘, ‘BSIT‘ put ‘pers‘, ‘Carl‘, ‘info:school‘, ‘BUIT‘ get ‘pers‘, ‘Carl‘
  • 6. Jaspersoft HBase QL { "tableName": "pers", "deserializerClass": "com.jaspersoft…DefaultDeserializer", "filter": { "SingleColumnValueFilter": { "family": „info", "qualifier": „school", "compareOp": "EQUAL", "comparator": { "SubstringComparator": { "substr": „BSIT" } } } } } 𝛔 𝐬𝐜𝐡𝐨𝐨𝐥=′ 𝐁𝐒𝐈𝐓′ 𝐩𝐞𝐫𝐬 http://community.jaspersoft.com/wiki/jaspersoft-hbase-query-language
  • 7. Phoenix SELECT * FROM pers WHERE school = ‘BSIT‘ 𝛔 𝐬𝐜𝐡𝐨𝐨𝐥=′ 𝐁𝐒𝐈𝐓′ 𝐩𝐞𝐫𝐬 https://github.com/forcedotcom/phoenix „Parent of each person?“
  • 8.
  • 10. Column RowID Value Input Cell Output Cell Column RowID Value
  • 11. _c _r _v _c _r _v Input Cell Output Cell
  • 12. _c born _r Lisa _v 1997 _c born _r Lisa _v 1997 Input Cell Output Cell 𝐩𝐞𝐫𝐬 OUT._r <- IN._r, OUT.born <- IN.born; 𝝅 𝒃𝒐𝒓𝒏
  • 13. _c born _r Lisa _v 1997 _c born _r Lisa _v 1997 Input Cell Output Cell 𝐩𝐞𝐫𝐬 OUT._r <- IN._r, OUT.born <- IN.born, OUT.school <- IN.school; 𝝅 𝒃𝒐𝒓𝒏,𝒔𝒄𝒉𝒐𝒐𝒍
  • 14. _c born _r Lisa _v 1997 _c born _r Lisa _v 1997 Input Cell Output Cell 𝐩𝐞𝐫𝐬 OUT._r <- IN._r, OUT.$(IN._c) <- IN._v; 𝛔 𝐬𝐜𝐡𝐨𝐨𝐥=′ 𝐁𝐒𝐈𝐓′
  • 15. _c born _r Lisa _v 1997 _c born _r Lisa _v 1997 Input Cell Output Cell IN-FILTER: school=‘BSIT‘, OUT._r <- IN._r, OUT.$(IN._c) <- IN._v; row predicate 𝐩𝐞𝐫𝐬𝛔 𝐬𝐜𝐡𝐨𝐨𝐥=′ 𝐁𝐒𝐈𝐓′
  • 18. _c cmpny _r Peter _v IBM _c salsum _r IBM _v 645k Input Cell Output Cell Salary sum of each company. OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary);
  • 19. RowId info Eve Carl Julia Lisa OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary): born cmpny salary 1965 IBM 70k born cmpny job 1966 IBM intern born cmpny salary 1967 IBM 80k born school salary 1997 BSIT 1k salsum IBM 70k salsum IBM 80k salsum IBM 150k
  • 21. RowId info children Peter Lisa born cmpny salary 1965 IBM 70k Lisa Carl Susi Toni €5 €0 €10 €7 born school 1997 BSIT Peter, 1965, IBM, 70k Lisa, 1997, BSIT €10 €5 €0€7
  • 22. RowId info children Peter Lisa born cmpny salary 1965 IBM 70k Lisa Carl Susi Toni €5 €0 €10 €7 born school 1997 BSIT OUT._r <- IN._r, OUT.$(IN._c) <- IN._v;
  • 23. RowId info children Peter Lisa born cmpny salary 1965 IBM 70k Lisa Carl Susi Toni €5 €0 €10 €7 born school 1997 BSIT IN-FILTER: COL_COUNT(children)>0 OUT._r <- IN._r, OUT.$(IN._c) <- IN._v;
  • 24. RowId info children Peter Lisa born cmpny salary 1965 IBM 70k Lisa Carl Susi Toni €5 €0 €10 €7 born school 1997 BSIT IN-FILTER: COL_COUNT(children)>0 OUT._r <- IN._r, OUT.$(IN.children._c) <- IN._v;
  • 25. RowId info children Peter Lisa born cmpny salary 1965 IBM 70k Lisa Carl Susi Toni €5 €0 €10 €7 born school 1997 BSIT IN-FILTER: COL_COUNT(children)>0 OUT._r <- IN._r, OUT.$(IN.children._c?(@>5)) <- IN._v; cell predicate
  • 26. RowId info children Peter Lisa born cmpny salary 1965 IBM 70k Lisa Carl Susi Toni €5 €0 €10 €7 born school 1997 BSIT IN-FILTER: COL_COUNT(children)>0 OUT._r <- IN._r, OUT.$(IN.children._c?(!Carl)) <- IN._v; cell predicate
  • 28. map(rowId, row) row violates row pred.? has more columns? no cell violates cell pred.? yes map IN.{_r,_c,_v}, fetched columns and constants to r,c and v no emit((r, c), v) no yes Stop yes RowId info Peter born cmpny salary 1965 IBM 70k salsum IBM 70k ((IBM, salsum), 70k)
  • 29. reduce((r,c), {v}) put(r, c, aggregateAll(v)) Stop ((IBM, salsum), {70k, 80k, 10k}) ((IBM, salsum), 160k)
  • 31. Salary sum of all people born before 1980 per company. IN-FILTER: born<1980, OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary); 17:45 17:47 17:50 Execute job New people are added Execute job again Δ+ =
  • 32. Salary sum of all people born before 1980 per company. IN-FILTER: born<1980, OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary); 17:45 17:47 17:50 Execute job New people are added Execute job again salsum IBM 150k born cmpny salary Melissa 1989 IBM 50k Nora 1977 IBM 80k salsum IBM 230k+
  • 33. Salary sum of all people born before 1980 per company. IN-FILTER: born<1980, OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary); 17:50 17:52 17:55 Execute job People are deleted Execute job again salsum IBM 230k born cmpny salary Melissa 1989 IBM 50k Nora 1977 IBM 80k salsum IBM 150k -
  • 34. Salary sum of all people born before 1980 per company. IN-FILTER: born<1980, OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary); 17:50 17:52 17:55 Execute job People are updated Execute job again born cmpny salary Peter 1965 IBM 70k 1990 -= 70k
  • 35. Salary sum of all people born before 1980 per company. IN-FILTER: born<1980, OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary); 17:50 17:52 17:55 Execute job People are updated Execute job again born cmpny salary Peter 1965 IBM 70k 1990 += 70k 1965
  • 36. Salary sum of all people born before 1980 per company. IN-FILTER: born<1980, OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary); 17:50 17:52 17:55 Execute job People are updated Execute job again born cmpny salary Peter 1965 IBM 70k 75k += 75k-70k
  • 37. Salary sum of all people born before 1980 per company. IN-FILTER: born<1980, OUT._r <- IN.cmpny, OUT.salsum <- SUM(IN.salary); 17:50 17:52 17:55 Execute job People are updated Execute job again born cmpny salary Peter 1965 IBM 70k SAP -= 70k
  • 38. map(rowId, row) row violates row pred.? has more columns? no cell violates cell pred.? yes map IN.{_r,_c,_v}, fetched columns and constants to r,c and v no emit((r, c), v) no yes Stop yes reduce((r,c), {v}) put(r, c, aggregateAll(v)) Stop Just read Delta (ts>ts‘)
  • 39. map(rowId, row) row violates row pred.? has more columns? no cell violates cell pred.? yes map IN.{_r,_c,_v}, fetched columns and constants to r,c and v no emit((r, c), v) no yes Stop yes reduce((r,c), {v}) put(r, c, aggregateAll(v)) Stop …and the former Result
  • 40. map(rowId, row) row violates row pred.? has more columns? no cell violates cell pred.? yes map IN.{_r,_c,_v}, fetched columns and constants to r,c and v no emit((r, c), v) no yes Stop yes reduce((r,c), {v}) put(r, c, aggregateAll(v)) Stop Was it satisfied on previous execution? Yes? Invert v *
  • 41. map(rowId, row) row violates row pred.? has more columns? no cell violates cell pred.? yes map IN.{_r,_c,_v}, fetched columns and constants to r,c and v no emit((r, c), v) no yes Stop yes reduce((r,c), {v}) put(r, c, aggregateAll(v)) Stop Unchanged?
  • 43. 0 10 20 30 40 50 60 70 80 90 100 TPCH 1 (selection, projection, sum of prices per status code) TPCH 6 (selection, projection, sum of all prices) Reverse Web-Link Graph Native Hadoop NotaQL (non-inc.) NotaQL (0.1% changes, timestamp- based CDC) NotaQL (10% changes, manual CDC)
  • 44. Conclusion • Selection, Projection • Grouping, Aggregation • Schema-Flexible • Horizontal Aggregation • Metadata↔Data • Graph Processing • Text Processing ≠SQL Incremental!