SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Big Data
Structuring, Modeling, Managing
COURSE BY DAAN GERITS
CONTENT
Dive into the techniques that make data systems scale
1
ANATOMY
2
DATA AT SCALE
What is so different in working with data the traditional way vs
the bigdata way?
3
DATA MODELS
An overview of the most popular types of data models
4
ADVICE
So what to make of all this?
Course by Daan Gerits
Data expert at design is dead
Co-Founder of Fitchain.io
data unicorn,
technopreneur,
founder
Daan Gerits
@daangerits
Co-Founder of Bigdata.be
https://pbs.twimg.com/profile_imag
es/431014702533976064/7RZOwlp
H_400x400.jpeg
01
ANATOMY
Discover the techniques that make data systems scale
• Replication and Partitioning
• System load
Course by Daan Gerits
What?
Copy data across physical nodes
Why?
Improve reliability and fault tolerance
How?
Create replica’s of the data and keep those in sync
Replication
Course by Daan Gerits
What?
Partition the data and distribute across physical
nodes
Why?
Scale data systems
How?
Logical partitioning key
Same partitioning key goes to same node
Partitioning
Course by Daan Gerits
Read Heavy
Most of the operations are read operations
Write Heavy
Most of the operations are write operations
Balanced
# read operations == # write operations
Load
Course by Daan Gerits
How you store the data
depends on how you
query the data
02
DATA AT SCALE
To seasoned data professionals a lot of the techniques and
approaches do not seem so different to what they have done
during the past decades. So what is so different?
Course by Daan Gerits
At the core of big data
is the ability to deal
with the volume,
variety and velocity of
data.
Course by Daan Gerits
Big Data is all about
new ways of thinking
about data
THINK DIFFERENT
OPERATIONAL
Automate your
processes through the
use of data
BUSINESS
Change the metrics
you use to measure
success
PERSONAL
Data makes people
important again. This
doesn’t stop with the
customer
Course by Daan Gerits
TRADITIONAL APPROACH
Supply Model Request
Request
Request
Course by Daan Gerits
Big Data Approach
Supply Model Request
Request
Request
Model
Model
03
DATA MODELS
How you want to retrieve your data has an impact in how you
store your data. These data models provide almost standard
approaches to do so.
HOW DATA IS STORED
GRAPH
Data model built out
of nodes and their
connections
COLUMN
FAMILY
Seriously powerful
but complex data
model, ideal for
sparse data
KEY-VALUE
A very simple data
model mapping a key
to a value
KV
DOCUMENT
A data model where
the structure of every
value can be different
KEY-VALUE
KEY VALUE
users.214.name Daan gerits
users.214.birthdate 18/05/1983
users.214.roles [user, admin]
users.214.isSubscribed true
users.214.social.twitter @daangerits
Course by Daan Gerits
Fast Lookups
But no way to query the data
Scanning if keys are ordered
Flexible value types
Key and value can be anything, even collections and
more complex data structures
Easy to scale
- Little to no dependencies between key-value pairs
- Ordering can become difficult to scale
Use cases
- Caches
- Configuration
KEY-VALUE
Course by Daan Gerits
SCAN <prefix>
Scan through all pairs where the key matches the
given prefix. This is only possible if the keys are
ordered
GET <key>
Get a key-value pair by its key
SET <key> <value>
Set the value of the given key
DELETE <key>
Remove the pair with the given key
KEY-VALUE
DOCUMENT
KEY DOCUMENT
daan {
“name”: “Daan Gerits”,
“birthday”: “18/05/1983”
}
wim {
“name”: “Wim Van Leuven”,
“company”: “Highestpoint”
}
Course by Daan Gerits
Queryable
Technology specific query language
Separate index needs to be kept in sync
Flexible value types
Key can be anything
Value is structured type (JSON, BSON, XML, …)
Scalability requires caution
- Relationships between documents
- Scaling search can become a hurdle
Use cases
- Search engines
- Entity Data Stores
DOCUMENT
Course by Daan Gerits
FIND <query>
Find all documents matching the given query
GET <key>
Get the document matching the given key
CREATE <key> <document>
Create a new document with the given key
UPDATE <key> <field> <value>
Update the given field within the document with the
given key
DELETE <key>
Remove the document with the given key
DOCUMENT
GRAPH
teaches
Name: Daan
Type: Tutor
1
Name: Els
Type: Tutor
2
Name: bigdata
Type: Course
3
Name: Amy
Type: Student
4
teaches
friend of
enrolled
in
Course by Daan Gerits
Relationships are first class citizens
Graph traversal in specific language
Updating relationships is cheap
Easy concepts
Node with properties
Edge
Very hard to scale
Golden Ratio
Scaling requires deep knowledge of the data
Use cases
- Social modeling
- Metadata stores
GRAPH
Course by Daan Gerits
LINK <type> <src-node-id> <target-node-id>
Create a new link with the given characteristics
UNLINK <type> <src-node-id> <target-node-id>
Remove the link with the given characteristics
GET <node-id>
Get the node with the given node id
SET <node-id> <properties>
Set the properties of the node with the given id
DELETE <node-id>
Remove the node with the given id
GRAPH
COLUMN FAMILY
KEY DEFAULT INVOICES
name birthday 2018/001 20../... 2019/483
customers/214 Daan
Gerits
18/05/1983 {
total: 980.03,
…
}
... {
total: 38.73,
…
}
customer/583 Wim Van
Leuven
10/05/1973 {
total: 20.83,
…
}
... {
total: 378.60,
…
}
Course by Daan Gerits
Seemingly trivial concepts
Table, RowKey, Column Family, Column
Hard to reason about
Dynamic column names
Optimize for retrieval
Very fast
All data including related data in one request
Use cases
- Analytical stores
COLUMN FAMILY
Course by Daan Gerits
SCAN <prefix>
Scan through all records where the key matches the
given prefix.
GET <key> <column_family> [, <column_family>]
Get the given column families for the given key
SET <key> <value>
Set the value of the given key
DELETE <key>
Remove the record with the given key
COLUMN FAMILY
04
ADVICE
So how to deal with all of this?
Course by Daan Gerits
Data model for writing
can differ from data
model for reading
Course by Daan Gerits
Always start from the
questions you are to
answer
Course by Daan Gerits
If you need a join, you
most likely did it
wrong!
Questions?
@daangerits

Weitere ähnliche Inhalte

Was ist angesagt?

Optimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data ScienceOptimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data Science
Vital.AI
 

Was ist angesagt? (20)

Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
NoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data ModelersNoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data Modelers
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Defining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentDefining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business Environment
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
Data management
Data managementData management
Data management
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive Framework
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Optimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data ScienceOptimizing the
 Data Supply Chain
 for Data Science
Optimizing the
 Data Supply Chain
 for Data Science
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry Report
 
Data Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryData Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data Discovery
 
Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0Technical Demonstration - Denodo Platform 7.0
Technical Demonstration - Denodo Platform 7.0
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 

Ähnlich wie Course 4 : Big Data Structuring, Integration and Management Systems by Daan Gerits

INFORMATION TECHNOLOGY PRESENTATION ON INFORMATON MANAGEMENT.pptx
INFORMATION TECHNOLOGY PRESENTATION ON INFORMATON MANAGEMENT.pptxINFORMATION TECHNOLOGY PRESENTATION ON INFORMATON MANAGEMENT.pptx
INFORMATION TECHNOLOGY PRESENTATION ON INFORMATON MANAGEMENT.pptx
odane3
 

Ähnlich wie Course 4 : Big Data Structuring, Integration and Management Systems by Daan Gerits (20)

Conceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingConceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data Modeling
 
Database 1 Introduction
Database 1   IntroductionDatabase 1   Introduction
Database 1 Introduction
 
Qiagram
QiagramQiagram
Qiagram
 
Metadata Strategies - Data Squared
Metadata Strategies - Data SquaredMetadata Strategies - Data Squared
Metadata Strategies - Data Squared
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AI
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
 
Physical Database Requirements.pdf
Physical Database Requirements.pdfPhysical Database Requirements.pdf
Physical Database Requirements.pdf
 
Data Structures - The Cornerstone of Your Data’s Home
Data Structures - The Cornerstone of Your Data’s HomeData Structures - The Cornerstone of Your Data’s Home
Data Structures - The Cornerstone of Your Data’s Home
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Relational Database explanation with detail.pdf
Relational Database explanation with detail.pdfRelational Database explanation with detail.pdf
Relational Database explanation with detail.pdf
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Fast Focus: SQL Server Graph Database & Processing
Fast Focus: SQL Server Graph Database & ProcessingFast Focus: SQL Server Graph Database & Processing
Fast Focus: SQL Server Graph Database & Processing
 
a scalable two phase top down specialization approach for data anonymization ...
a scalable two phase top down specialization approach for data anonymization ...a scalable two phase top down specialization approach for data anonymization ...
a scalable two phase top down specialization approach for data anonymization ...
 
Why Data Modeling Is Fundamental
Why Data Modeling Is FundamentalWhy Data Modeling Is Fundamental
Why Data Modeling Is Fundamental
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working Together
 
Essential Reference and Master Data Management
Essential Reference and Master Data ManagementEssential Reference and Master Data Management
Essential Reference and Master Data Management
 
INFORMATION TECHNOLOGY PRESENTATION ON INFORMATON MANAGEMENT.pptx
INFORMATION TECHNOLOGY PRESENTATION ON INFORMATON MANAGEMENT.pptxINFORMATION TECHNOLOGY PRESENTATION ON INFORMATON MANAGEMENT.pptx
INFORMATION TECHNOLOGY PRESENTATION ON INFORMATON MANAGEMENT.pptx
 
mongo db EMERSON EDUARDO RODRIGUES
mongo db EMERSON EDUARDO RODRIGUESmongo db EMERSON EDUARDO RODRIGUES
mongo db EMERSON EDUARDO RODRIGUES
 

Kürzlich hochgeladen

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Kürzlich hochgeladen (20)

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 

Course 4 : Big Data Structuring, Integration and Management Systems by Daan Gerits

  • 1. Big Data Structuring, Modeling, Managing COURSE BY DAAN GERITS
  • 2. CONTENT Dive into the techniques that make data systems scale 1 ANATOMY 2 DATA AT SCALE What is so different in working with data the traditional way vs the bigdata way? 3 DATA MODELS An overview of the most popular types of data models 4 ADVICE So what to make of all this?
  • 3. Course by Daan Gerits Data expert at design is dead Co-Founder of Fitchain.io data unicorn, technopreneur, founder Daan Gerits @daangerits Co-Founder of Bigdata.be https://pbs.twimg.com/profile_imag es/431014702533976064/7RZOwlp H_400x400.jpeg
  • 4. 01 ANATOMY Discover the techniques that make data systems scale • Replication and Partitioning • System load
  • 5. Course by Daan Gerits What? Copy data across physical nodes Why? Improve reliability and fault tolerance How? Create replica’s of the data and keep those in sync Replication
  • 6. Course by Daan Gerits What? Partition the data and distribute across physical nodes Why? Scale data systems How? Logical partitioning key Same partitioning key goes to same node Partitioning
  • 7. Course by Daan Gerits Read Heavy Most of the operations are read operations Write Heavy Most of the operations are write operations Balanced # read operations == # write operations Load
  • 8. Course by Daan Gerits How you store the data depends on how you query the data
  • 9. 02 DATA AT SCALE To seasoned data professionals a lot of the techniques and approaches do not seem so different to what they have done during the past decades. So what is so different?
  • 10. Course by Daan Gerits At the core of big data is the ability to deal with the volume, variety and velocity of data.
  • 11. Course by Daan Gerits Big Data is all about new ways of thinking about data
  • 12. THINK DIFFERENT OPERATIONAL Automate your processes through the use of data BUSINESS Change the metrics you use to measure success PERSONAL Data makes people important again. This doesn’t stop with the customer
  • 13. Course by Daan Gerits TRADITIONAL APPROACH Supply Model Request Request Request
  • 14. Course by Daan Gerits Big Data Approach Supply Model Request Request Request Model Model
  • 15. 03 DATA MODELS How you want to retrieve your data has an impact in how you store your data. These data models provide almost standard approaches to do so.
  • 16. HOW DATA IS STORED GRAPH Data model built out of nodes and their connections COLUMN FAMILY Seriously powerful but complex data model, ideal for sparse data KEY-VALUE A very simple data model mapping a key to a value KV DOCUMENT A data model where the structure of every value can be different
  • 17. KEY-VALUE KEY VALUE users.214.name Daan gerits users.214.birthdate 18/05/1983 users.214.roles [user, admin] users.214.isSubscribed true users.214.social.twitter @daangerits
  • 18. Course by Daan Gerits Fast Lookups But no way to query the data Scanning if keys are ordered Flexible value types Key and value can be anything, even collections and more complex data structures Easy to scale - Little to no dependencies between key-value pairs - Ordering can become difficult to scale Use cases - Caches - Configuration KEY-VALUE
  • 19. Course by Daan Gerits SCAN <prefix> Scan through all pairs where the key matches the given prefix. This is only possible if the keys are ordered GET <key> Get a key-value pair by its key SET <key> <value> Set the value of the given key DELETE <key> Remove the pair with the given key KEY-VALUE
  • 20. DOCUMENT KEY DOCUMENT daan { “name”: “Daan Gerits”, “birthday”: “18/05/1983” } wim { “name”: “Wim Van Leuven”, “company”: “Highestpoint” }
  • 21. Course by Daan Gerits Queryable Technology specific query language Separate index needs to be kept in sync Flexible value types Key can be anything Value is structured type (JSON, BSON, XML, …) Scalability requires caution - Relationships between documents - Scaling search can become a hurdle Use cases - Search engines - Entity Data Stores DOCUMENT
  • 22. Course by Daan Gerits FIND <query> Find all documents matching the given query GET <key> Get the document matching the given key CREATE <key> <document> Create a new document with the given key UPDATE <key> <field> <value> Update the given field within the document with the given key DELETE <key> Remove the document with the given key DOCUMENT
  • 23. GRAPH teaches Name: Daan Type: Tutor 1 Name: Els Type: Tutor 2 Name: bigdata Type: Course 3 Name: Amy Type: Student 4 teaches friend of enrolled in
  • 24. Course by Daan Gerits Relationships are first class citizens Graph traversal in specific language Updating relationships is cheap Easy concepts Node with properties Edge Very hard to scale Golden Ratio Scaling requires deep knowledge of the data Use cases - Social modeling - Metadata stores GRAPH
  • 25. Course by Daan Gerits LINK <type> <src-node-id> <target-node-id> Create a new link with the given characteristics UNLINK <type> <src-node-id> <target-node-id> Remove the link with the given characteristics GET <node-id> Get the node with the given node id SET <node-id> <properties> Set the properties of the node with the given id DELETE <node-id> Remove the node with the given id GRAPH
  • 26. COLUMN FAMILY KEY DEFAULT INVOICES name birthday 2018/001 20../... 2019/483 customers/214 Daan Gerits 18/05/1983 { total: 980.03, … } ... { total: 38.73, … } customer/583 Wim Van Leuven 10/05/1973 { total: 20.83, … } ... { total: 378.60, … }
  • 27. Course by Daan Gerits Seemingly trivial concepts Table, RowKey, Column Family, Column Hard to reason about Dynamic column names Optimize for retrieval Very fast All data including related data in one request Use cases - Analytical stores COLUMN FAMILY
  • 28. Course by Daan Gerits SCAN <prefix> Scan through all records where the key matches the given prefix. GET <key> <column_family> [, <column_family>] Get the given column families for the given key SET <key> <value> Set the value of the given key DELETE <key> Remove the record with the given key COLUMN FAMILY
  • 29. 04 ADVICE So how to deal with all of this?
  • 30. Course by Daan Gerits Data model for writing can differ from data model for reading
  • 31. Course by Daan Gerits Always start from the questions you are to answer
  • 32. Course by Daan Gerits If you need a join, you most likely did it wrong!