For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/
---------
"Data is the new oil" - Many companies and professionals do not know how to use their data or are not aware of the added value they could gain from it.
It is in response to these problems that the project “Brussels: The Beating Heart of Big Data” was born.
This project, financed by the Region of Brussels Capital and organised by Betacowork, offers 3 training cycles of 10 courses on big data, at both beginner and advanced levels. These 3 cycles will be followed by a Hackathon weekend.
No prerequisites are required to start these courses. The aim of these courses is to familiarize participants with the principles of Big Data.
------
For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/
2. CONTENT
Dive into the techniques that make data systems scale
1
ANATOMY
2
DATA AT SCALE
What is so different in working with data the traditional way vs
the bigdata way?
3
DATA MODELS
An overview of the most popular types of data models
4
ADVICE
So what to make of all this?
3. Course by Daan Gerits
Data expert at design is dead
Co-Founder of Fitchain.io
data unicorn,
technopreneur,
founder
Daan Gerits
@daangerits
Co-Founder of Bigdata.be
https://pbs.twimg.com/profile_imag
es/431014702533976064/7RZOwlp
H_400x400.jpeg
5. Course by Daan Gerits
What?
Copy data across physical nodes
Why?
Improve reliability and fault tolerance
How?
Create replica’s of the data and keep those in sync
Replication
6. Course by Daan Gerits
What?
Partition the data and distribute across physical
nodes
Why?
Scale data systems
How?
Logical partitioning key
Same partitioning key goes to same node
Partitioning
7. Course by Daan Gerits
Read Heavy
Most of the operations are read operations
Write Heavy
Most of the operations are write operations
Balanced
# read operations == # write operations
Load
8. Course by Daan Gerits
How you store the data
depends on how you
query the data
9. 02
DATA AT SCALE
To seasoned data professionals a lot of the techniques and
approaches do not seem so different to what they have done
during the past decades. So what is so different?
10. Course by Daan Gerits
At the core of big data
is the ability to deal
with the volume,
variety and velocity of
data.
11. Course by Daan Gerits
Big Data is all about
new ways of thinking
about data
12. THINK DIFFERENT
OPERATIONAL
Automate your
processes through the
use of data
BUSINESS
Change the metrics
you use to measure
success
PERSONAL
Data makes people
important again. This
doesn’t stop with the
customer
13. Course by Daan Gerits
TRADITIONAL APPROACH
Supply Model Request
Request
Request
14. Course by Daan Gerits
Big Data Approach
Supply Model Request
Request
Request
Model
Model
15. 03
DATA MODELS
How you want to retrieve your data has an impact in how you
store your data. These data models provide almost standard
approaches to do so.
16. HOW DATA IS STORED
GRAPH
Data model built out
of nodes and their
connections
COLUMN
FAMILY
Seriously powerful
but complex data
model, ideal for
sparse data
KEY-VALUE
A very simple data
model mapping a key
to a value
KV
DOCUMENT
A data model where
the structure of every
value can be different
18. Course by Daan Gerits
Fast Lookups
But no way to query the data
Scanning if keys are ordered
Flexible value types
Key and value can be anything, even collections and
more complex data structures
Easy to scale
- Little to no dependencies between key-value pairs
- Ordering can become difficult to scale
Use cases
- Caches
- Configuration
KEY-VALUE
19. Course by Daan Gerits
SCAN <prefix>
Scan through all pairs where the key matches the
given prefix. This is only possible if the keys are
ordered
GET <key>
Get a key-value pair by its key
SET <key> <value>
Set the value of the given key
DELETE <key>
Remove the pair with the given key
KEY-VALUE
21. Course by Daan Gerits
Queryable
Technology specific query language
Separate index needs to be kept in sync
Flexible value types
Key can be anything
Value is structured type (JSON, BSON, XML, …)
Scalability requires caution
- Relationships between documents
- Scaling search can become a hurdle
Use cases
- Search engines
- Entity Data Stores
DOCUMENT
22. Course by Daan Gerits
FIND <query>
Find all documents matching the given query
GET <key>
Get the document matching the given key
CREATE <key> <document>
Create a new document with the given key
UPDATE <key> <field> <value>
Update the given field within the document with the
given key
DELETE <key>
Remove the document with the given key
DOCUMENT
24. Course by Daan Gerits
Relationships are first class citizens
Graph traversal in specific language
Updating relationships is cheap
Easy concepts
Node with properties
Edge
Very hard to scale
Golden Ratio
Scaling requires deep knowledge of the data
Use cases
- Social modeling
- Metadata stores
GRAPH
25. Course by Daan Gerits
LINK <type> <src-node-id> <target-node-id>
Create a new link with the given characteristics
UNLINK <type> <src-node-id> <target-node-id>
Remove the link with the given characteristics
GET <node-id>
Get the node with the given node id
SET <node-id> <properties>
Set the properties of the node with the given id
DELETE <node-id>
Remove the node with the given id
GRAPH
27. Course by Daan Gerits
Seemingly trivial concepts
Table, RowKey, Column Family, Column
Hard to reason about
Dynamic column names
Optimize for retrieval
Very fast
All data including related data in one request
Use cases
- Analytical stores
COLUMN FAMILY
28. Course by Daan Gerits
SCAN <prefix>
Scan through all records where the key matches the
given prefix.
GET <key> <column_family> [, <column_family>]
Get the given column families for the given key
SET <key> <value>
Set the value of the given key
DELETE <key>
Remove the record with the given key
COLUMN FAMILY